OpenBlock 1.2 documentation

parser Package

parser Package

cities Module

models Module

numbered_streets Module

parsing Module

class ebpub.geocoder.parser.parsing.Location(*args)

Bases: dict

A dict-like object with only a few valid keys: ('number', 'pre_dir', 'prefix', 'street', 'suffix', 'post_dir', 'city', 'state', 'zip')

class ebpub.geocoder.parser.parsing.Standardizer(d)

Bases: object

Replaces a suffix, directional, state, etc. with the preferred standard form.

For example, given the text "avenu" for suffixes, returns "AVE".

>>> suff_standardizer = Standardizer(suffixes)
>>> suff_standardizer("avenu")
'AVE'
>>> dir_standardizer = Standardizer(DIRECTIONALS)
>>> dir_standardizer("north")
'N'
>>> dir_standardizer("n")
'N'
>>> pre_standardizer = Standardizer(prefixes)
>>> pre_standardizer("US hwy")
'US HIGHWAY'
>>> pre_standardizer("SR")
'STATE ROUTE'
ebpub.geocoder.parser.parsing.abbrev_regex(d, case_insensitive=True, matches_entirely=True)

Returns a regular expression pattern that matches an abbreviation:

>>> suffixes = {
...     'av': ['ave', 'avenue'],
...     'st': ['str', 'street'],
...     'rd': 'road'
... }
>>> regex = abbrev_regex(suffixes)
>>> re.search(regex, "Ave")  
<_sre.SRE_Match object at ...>
>>> re.search(regex, " Ave ") == None
True
>>> regex = abbrev_regex(suffixes, case_insensitive=False)
>>> re.search(regex, "str")  
<_sre.SRE_Match object at ...>
>>> re.search(regex, "Str") == None
True
>>> regex = abbrev_regex(suffixes, matches_entirely=False)
>>> re.search(regex, " Road ")  
<_sre.SRE_Match object at ...>
ebpub.geocoder.parser.parsing.address_combinations()

Generator that yields a list of strings for every possible combination of address tokens. For example:

['number', 'pre_dir', 'street']
['number', 'street', 'city', 'state']

There were about 6240 combinations at last count.

ebpub.geocoder.parser.parsing.normalize(location)

Normalizes an address string for parsing, comparisons.

>>> normalize(u"1972 n. dawson ave. chicago il")
u'1972 N DAWSON AVE CHICAGO IL'
>>> normalize(u"1972 n. dawson ave., chicago il")
u'1972 N DAWSON AVE CHICAGO IL'
>>> normalize(u"n kimball ave & w diversey ave")
u'N KIMBALL AVE & W DIVERSEY AVE'
>>> normalize(None)
u'NONE'
ebpub.geocoder.parser.parsing.number_standardizer(s)

Removes the second number in hyphenated addresses such as '123-02', as used in NYC. Note that this also removes the second number in address ranges, and non-digit prefixes or suffixes:

>>> number_standardizer('1-2')
'1'
>>> number_standardizer('100-200')
'100'
>>> number_standardizer('12A-12B')
'12'
>>> number_standardizer('x')
'x'
>>> number_standardizer('257b')
'257'
>>> number_standardizer('9L00')
'9'
>>> number_standardizer('W01')
'01'
>>> number_standardizer('9 8 7 6 5')
'9'
ebpub.geocoder.parser.parsing.parse(location)

Given a location string, return a list of possible valid results as Location instances.

ebpub.geocoder.parser.parsing.prefix_regex(case_insensitive=True, matches_entirely=True)

Returns a regex that matches any token of the prefixes:

>>> regex = prefix_regex()
>>> re.search(regex, 'HWY')   
<_sre.SRE_Match object at ...>
>>> re.search(regex, 'NY')   
<_sre.SRE_Match object at ...>
>>> print re.search(regex, 'nope')
None
ebpub.geocoder.parser.parsing.strip_unit(location)

Given an address string, strips the apartment number, suite number, etc.

>>> strip_unit('200 E 31st st')
'200 E 31st st'
>>> strip_unit('200 E 31st st unit 123')
'200 E 31st st'
>>> strip_unit('123 W broadway apt B')
'123 W broadway'
>>> strip_unit('99 s northshore drive apt. B')
'99 s northshore drive'
>>> strip_unit('45 carlton ave #12')
'45 carlton ave'
>>> strip_unit('148 lafayette st suite 13')
'148 lafayette st'
ebpub.geocoder.parser.parsing.token_split()

findall(string[, pos[, endpos]]) --> list. Return a list of all non-overlapping matches of pattern in string.

states Module

suffixes Module