Country, state/province/region, city from lat, long


Advanced search

Message boards : Number crunching : Country, state/province/region, city from lat, long

Author Message
Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 806 - Posted: 2 Mar 2012 | 4:10:39 UTC
Last modified: 2 Mar 2012 | 4:11:56 UTC

See http://stackoverflow.com/questions/4013606/google-maps-how-to-get-country-state-province-region-city-given-a-lat-long-va.
____________

Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 821 - Posted: 4 Mar 2012 | 6:23:12 UTC

Here's a Python script to associate country, state and city info to detector IDs using the Google reverse geocoding service mentioned in the previous post. The script gets the detector ID numbers and lat/long coordinates in a crude way from the sample data provided in the "Raw data" thread. It would be better if the script had direct access to the database but I created the script to test and evaluate Google's reverse geocoding service and demo how it could work, not how it should work. The script stops short of moving the raw sample data into compressed country files but it does sort the existing detector IDs into country files. Moving the sample data itself is a trivial step.

The get_location() function might be useful for anyone wanting to keep the sample data in country files if they want to use Python.

From running the script I learned 2 interesting things:

1) There are only approximately 70 detectors in operation but there are over 1600 detector IDs!

2) The Google reverse geocoding service is amazing but it's far from perfect. It doesn't have location info for several detectors that I assume are located in populated regions. You would think there would be at least country data for them. The script throws those detector IDs into a file named no data.txt since there is no country info for them.

I will develop the script further if anybody would find it useful. Of course it's open source, free, no guarantees, yada yada yada. It's built on Python 2.7 and can be easily converted to run on Python 3.x.

--------------------------------------------------------------------------------------------------------------------------


'''
March 3, 2012
Dagorath

For each valid detector_ID (ID that does not show 'Couldn't find computer' on it's web page), get the
sample data web page for the ID and from that get the latitude and longitude (lat_long). Use the lat_long
in a request for reverse geocode data (location information) from Google as described at
http://stackoverflow.com/questions/4013606/google-maps-how-to-get-country-state-province-region-city-given-a-lat-long-va,
parse the city, state/province/district country and short country info from the returned xml. Write the detector ID
to a file whose name is the detector's country with .txt extension in the directory assigned to the detector_2_country_data_dir
variable below. Write a log to id2ctry.log in the current directory. Other data lists are created but are not used at this time.

This script has a few limitations:

1) The script does not know how many valid detector IDs exist so I place an arbitrary limit on the number of IDs in the
while loop. I know the number is approximately 1600 so I've set the limit to 1700 plus an additional exit if it sees
10 invalid IDs in a row.

2) The country files are not automatically pruned or deleted. If you don't delete the files existing IDs will be
appended to the country files with every script run and create numerous duplicate entrys.

'''

from urllib import urlopen
from os.path import exists
from os import mkdir
from os.path import join
import time

#################################################################################################################
def get_lat_long(detector_ID):

#print u'-'*40
#time.sleep(.1)
try:
samples = urlopen('http://radioactiveathome.org/boinc/gettrickledata.php?hostid=' + str(detector_ID))
except:
log_it('{:*>5}'.format(detector_ID) + u' unknown HTTP exception in get_lat_long().')
return (1,1)

count = 0
for line in samples:
count += 1
line = line.decode('utf-8')
line = line.strip(u'\n')
#print count, line
if count == 1 and line == u':':
log_it('{:*>5}'.format(detector_ID) + u' No data.')
return (2,2)
elif count == 2:
lat, lon = 0, 0
line = line.split(u',')
lat, lon = line[4], line[5]
# print lat, u' ', lon
break
#print u'-'*40
return (lat, lon)

#################################################################################################################
def log_it(x):
#time.sleep(.5)
log_file = open(u'id2ctry.log', 'a')
# log_file.write(time.strftime('%c') + u' ' + x + u'\n')
log_file.write(x + u'\n')
log_file.close()

#print x

#################################################################################################################
def check_detector_ID(detector_ID):

# check if website knows detector_ID
#time.sleep(.5)
try:
test_page = urlopen(u'http://radioactiveathome.org/boinc/show_host_detail.php?hostid=' + str(detector_ID))
except:
log_it('{:*>5}'.format(detector_ID) + u' unknown HTTP exception in test_detector_ID().')
return 1

for line in test_page:
line = line.decode('utf-8')
if u't find com' in line:
log_it('{:*>5}'.format(detector_ID) + u' does not exist.')
return 2
else: return 0

################################################################################################################
def get_location(lat_long, detector_ID):

#time.sleep(.5)
try:
xml = urlopen(u'http://maps.googleapis.com/maps/api/geocode/xml?latlng=' + lat_long[0] + ',' + lat_long[1] + '&sensor=false')
except:
#print u'http error'
log_it('{:*>5}'.format(detector_ID) + u' unknown HTTP exception in get_location().')
return 1

x = ''
for line in xml:

line = line.decode('utf-8')
line = line.strip(u'\n').strip()
x += line

x = x.split(u'</address_component>')
# print x

city, province, country, short_country = u'no data', u'no data', u'no data', u'no data'

if not '<status>OK</status>' in x[0]:
log_it('{:*>5}'.format(detector_ID) + u' No OK status from Google in get_location()')
#print u'No status'
return (city, province, country, short_country)

else:

#print u'len(x):', len(x)
# province will be what Google refers to as "administrative_area_level_1" in their xml
# country is either long or short form, there is also "administrative _area_level_2" which is unknown to me and I
# don't use it

for i in range(1, len(x)):
if u'<type>locality' in x[i]:
city = x[i][x[i].find('long_name>') + 10 : x[i].find('</long_name')]
# print u'city:', city
elif u'<type>administrative_area_level_1' in x[i]:
province = x[i][x[i].find(u'long_name>') + 10 : x[i].find(u'</long_name')]
# print u'province:', province
elif u'<type>country' in x[i]:
country = x[i][x[i].find(u'long_name>') + 10 : x[i].find(u'</long_name')]
short_country = x[i][x[i].find(u'short_name>') + 11 : x[i].find(u'</short_name')]
# print u'country:', country, short_country
if not city == u'no data' and not province == u'no data' and not country == u'no data':
break # when all 3 are not null we have what we want else we search to end of xml

return (city, province, country, short_country)

########################################################################################################################

detector_ID_with_lat_long_list = []
confirmed_detector_ID_list = []
detector_ID_with_location_list = []
invalid = 0

# name the data dir, that dir will be then be created and all coujntry data files will
# be stored in it
detector_2_country_data_dir = u'detector_2_country'
if not exists(detector_2_country_data_dir): mkdir(detector_2_country_data_dir)

start_time = time.strftime('%c')

log_it(u'Start id2ctry.')
detector_ID = 0

while detector_ID < 1700 and invalid < 10:

time.sleep(.1)
# check if detector_ID exists
detector_ID += 1
rc = check_detector_ID(detector_ID)
if rc == 1: pass # got HTTP exception , nothing more to do for this detector_ID
elif rc == 2: invalid += 1
elif rc == 0:
invalid = 0
# we have a confirmed detector_ID, add it to the list of confirmed detector_IDs
# then try to get a lat and long for it
print detector_ID
confirmed_detector_ID_list.append('{:0>5}'.format(detector_ID))
rc = get_lat_long(detector_ID) # rc will be a 2-tuple
if rc[0] == 1 or rc[0] == 2: continue # either an HTTP error occueed or detector_ID has no data
else:
# detector_ID exists and has a lat and long, now get its country, level 1 and possibly city from Google
detector_ID_with_lat_long_list.append('{:<5}'.format(detector_ID))
location = get_location(rc, detector_ID) # location will be a 4-tuple: city, province/state/district, country, short_country
# any or all fields can = 'no data' in remote districts
if not location[2] == u'no data': detector_ID_with_location_list.append(str(detector_ID) + u',' + location[2])
outfile = open(join(detector_2_country_data_dir, location[2].lower() + u'.txt'), u'a')
try:
outfile.write(str(detector_ID) + u',' + location[1] + u',' + location[0] + u',' + rc[0] + u',' + rc[1] + u'\n')
except:
outfile.write(str(detector_ID) + u',no data,no data,' + rc[0] + u',' + rc[1] + u'\n')
outfile.close()
print
print u'Finished ' + time.strftime('%c')
print u'Started ' + start_time
print
log_it(u'Confirmed detectors: ' + str(len(confirmed_detector_ID_list)))
log_it(u'Detectors with lat and long: ' + str(len(detector_ID_with_lat_long_list)))
log_it(u'Detectors with country: ' + str(len(detector_ID_with_location_list)))




____________

Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 833 - Posted: 10 Mar 2012 | 10:33:42 UTC

I have the raw data sorting now into files by country, state and city. The directory tree is below. The country named "lost" is the directory for detectors that have no latitude and longitude. Each filename consists of the detector_ID (host_ID) followed by the latitude and longitude. Detectors that are not in a city are stored in a folder named for its state/province. There are 75 files therefore 75 detectors reporting data though 8 of those are "lost". I will make the script send a PM to the owners of detectors that do not have coordinates for more than 7 days, 1 PM only. Since I have their country info I can probably use that to guess which language they speak, get a translation of the message into their language from Google and send it to them.

hi_dat ├── canada │   └── alberta │   └── lethbridge │   └── 1016_49.663364_-112.798454.txt ├── germany │   ├── baden-wurttemberg │   │   ├── neulingen │   │   │   └── 497_48.968441_8.728466.txt │   │   ├── reutlingen │   │   │   └── 628_48.564335_9.231973.txt │   │   ├── rheinfelden │   │   │   └── 880_47.582821_7.805476.txt │   │   ├── rosenfeld │   │   │   └── 954_48.265369_8.693833.txt │   │   └── sindelfingen │   │   ├── 646_48.695747_9.015518.txt │   │   └── 647_48.726219_9.067961.txt │   ├── bayern │   │   ├── emmering │   │   │   └── 481_48.189411_11.280191.txt │   │   ├── hemau │   │   │   └── 480_49.055630_11.788689.txt │   │   ├── maxhütte-haidhof │   │   │   └── 1464_49.197693_12.090973.txt │   │   └── schongau │   │   └── 505_47.804909_10.886885.txt │   ├── bremen │   │   └── bremen │   │   └── 91_53.064697_8.812038.txt │   ├── hessen │   │   ├── buseck │   │   │   └── 990_50.630737_8.826141.txt │   │   ├── darmstadt │   │   │   └── 1182_49.882027_8.658943.txt │   │   └── rüsselsheim │   │   └── 881_49.994568_8.412435.txt │   ├── mecklenburg-vorpommern │   │   └── stavenhagen │   │   └── 935_53.703400_12.910266.txt │   ├── niedersachsen │   │   ├── goslar │   │   │   └── 478_51.916584_10.470486.txt │   │   └── osnabrück │   │   └── 859_52.255898_8.005659.txt │   ├── nordrhein-westfalen │   │   ├── lübbecke │   │   │   └── 694_52.306618_8.625813.txt │   │   └── rheinbach │   │   └── 499_50.588699_6.897968.txt │   └── sachsen-anhalt │   ├── dessau-roßlau │   │   └── 905_51.798672_12.245851.txt │   └── kalbe │   └── 1276_52.683613_11.422345.txt ├── ireland │   └── county_limerick │   └── limerick │   ├── 1488_52.660892_-8.620791.txt │   └── 930_52.662342_-8.619976.txt ├── japan │   └── saitama_prefecture │   └── yoshikawa │   └── 374_35.871613_139.845444.txt ├── lost │   ├── 1252_0.000000_0.000000.txt │   ├── 1567_0.000000_0.000000.txt │   ├── 1608_0.000000_0.000000.txt │   ├── 1613_0.000000_0.000000.txt │   ├── 1650_0.000000_0.000000.txt │   ├── 1651_0.000000_0.000000.txt │   ├── 1655_0.000000_0.000000.txt │   └── 311_0.000000_0.000000.txt ├── poland │   ├── dolnoslaskie │   │   ├── gmina_jelcz-laskowice │   │   │   └── 1478_51.075649_17.283297.txt │   │   ├── kudowa-zdrój │   │   │   └── 2_50.443192_16.256702.txt │   │   ├── legnica │   │   │   └── 645_51.226318_16.165159.txt │   │   └── wroclaw │   │   ├── 1658_51.091232_16.986752.txt │   │   └── 470_51.074856_17.043335.txt │   ├── kujawsko-pomorskie │   │   └── ciechocinek │   │   └── 650_52.874920_18.791599.txt │   ├── lodzkie │   │   └── brzeziny │   │   ├── 506_51.802658_19.751944.txt │   │   └── 6_51.803467_19.753675.txt │   ├── lubelskie │   │   ├── łubki-kolonia │   │   │   └── 495_51.241779_22.299938.txt │   │   └── lublin │   │   └── 1649_51.233978_22.535191.txt │   ├── malopolskie │   │   ├── gmina_brzeszcze │   │   │   └── 635_49.975899_19.129429.txt │   │   ├── kęty │   │   │   └── 1486_49.882607_19.207621.txt │   │   └── krakow │   │   └── 642_50.064583_19.919947.txt │   ├── mazowieckie │   │   ├── siedlce │   │   │   └── 1391_52.175282_22.285666.txt │   │   ├── strzała │   │   │   └── 5_52.196087_22.290573.txt │   │   └── warsaw │   │   ├── 1602_52.150356_21.042059.txt │   │   ├── 1657_52.220558_21.097420.txt │   │   └── 634_52.272110_21.015102.txt │   ├── opolskie │   │   └── opole │   │   └── 1408_50.682198_17.937155.txt │   ├── podkarpackie │   │   └── krasne │   │   └── 1332_50.051186_22.086365.txt │   ├── podlaskie │   │   └── gmina_michałowo │   │   └── 516_53.029148_23.607838.txt │   ├── pomorskie │   │   └── gdynia │   │   ├── 1218_54.506954_18.546532.txt │   │   ├── 755_54.505760_18.549086.txt │   │   └── 798_54.506855_18.545845.txt │   ├── śląskie │   │   ├── czechowice-dziedzice │   │   │   └── 51_49.919102_18.998129.txt │   │   ├── czestochowa │   │   │   └── 689_50.785103_19.145651.txt │   │   └── tychy │   │   └── 633_50.126183_18.977777.txt │   ├── wielkopolskie │   │   └── poznan │   │   └── 584_52.393932_16.905457.txt │   └── zachodniopomorskie │   ├── kolobrzeg │   │   ├── 1094_54.177757_15.577068.txt │   │   └── 1584_54.175571_15.548658.txt │   └── szczecin │   ├── 200_53.461983_14.567497.txt │   ├── 821_53.461933_14.567298.txt │   └── 875_53.463108_14.577853.txt ├── spain │   └── andalucía │   └── granada │   └── 663_37.206364_-3.619376.txt ├── ukraine │   └── zhytomyrs'ka │   └── berdychiv │   └── 1503_49.877823_28.588829.txt ├── united_kingdom │   └── england │   ├── 324_51.405174_-0.512280.txt │   ├── church_warsop │   │   └── 1635_53.214565_-1.164980.txt │   ├── shireoaks │   │   └── 7_53.322926_-1.158328.txt │   └── worksop │   └── 619_53.305107_-1.135454.txt ├── united_states │   ├── california │   │   └── dublin │   │   └── 1077_37.705959_-121.957085.txt │   └── idaho │   └── twin_falls │   └── 1103_42.568031_-114.426903.txt └── valid_ids.txt 93 directories, 75 files

____________

Profile TJM
Project administrator
Project developer
Project tester
Send message
Joined: 16 Apr 11
Posts: 291
Credit: 1,368,737
RAC: 57

Message 834 - Posted: 10 Mar 2012 | 11:21:21 UTC - in response to Message 833.

I'll be implementing this on the server in the near future (not sure when yet, because there are lots of high priority things to do), however it has to be changed a bit.
I don't know if you noticed, but the location is not fixed for a sensor; instead the location is assigned to a single row of data.
It is done like that to support mobile sensors which are often moved from one place to another; also the server is ready to support sensors with built-in GPS receivers (which are planned, AFAIK the prototype is in early development stage already) where the position will be updated with every packet of data.


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 843 - Posted: 10 Mar 2012 | 21:03:34 UTC - in response to Message 834.

I am aware that each data sample has its own coordinates and that those coordinates can change but since your detectors don't have GPS at this time the script assumes the coordinates in the last data sample are its current location and stores the samples accordingly.

Soon I will change the script to watch for a moving detector and store its data in some more or less sensible manner I haven't conceived yet. There are many possibilities and I'm open to suggestions.

____________

Post to thread

Message boards : Number crunching : Country, state/province/region, city from lat, long


Main page · Your account · Message boards


Copyright © 2024 BOINC@Poland | Open Science for the future