VDAT Geocoder for RcCollect, DSS and CAS-CW processing. Version October 7, 2018
This web site is designed to assist Red Cross VDAT personnel to clean up and geocode addresses. While it has other uses, its primary focus is processing data from RcCollect (DDA field collector app), SS (Street Sheets) and data dumps from the CAS-CW database. Report problems or enhancement requests to Mike.Schmidt@Redcros.org NOTE mispelled to prevent crawlers from getting e-mail address.
Instructions:
Place your key in the "Api Key" edit box.
Press the Choose File button and select a csv file to load. The CSV file must contain address information in one of these three formats including {houseno, aptno, streetname, city and state}, {address, city, state} or {fulladdress}. The script will not load the file if it contains extraneous characters like {", /} so you need to remove them before loading the file. This is easily done by opening the csv file (not xlsx) with excel and replace the extraneous characters with spaces. Then save the file and load it.
Enter value in "Distance in Feet to generate..." field. Default value will always trace it out
Press "Geocode" button and wait all of the addresses to be processed.
Results show in the lower left (purple text). Select all and copy the output and paste into Excel or any text editor.
Additional output windows include "Accuracy Summary", "Distance Summary, and "User Errors" which provide statistics on Geocode Accuracy, Distance between Geocoded Address and Measured, and User Address Entry Error Rate respectively.
NOTICE: This site requires a Google Geocoding API. By entering an api key in this web page, the users accepts full responsibility for any costs or damages incurred.
Using a Google API key:
Google allows 40K addresses to be processed per month for free. After that, it costs $5/1000 addresses. View Usage. Get a Google API key.
Google Api Key:
File to be processed
Select an output format. Details on these formats are provided later down the page.
Distance in Feet to generate the comment (dist= ? feet) in the notes field regarding the distance between accurately Geocoded Address locations and Survey Point. This should be set to a distance, that if exceeded, would require scrutiny by a reviewer.
Description of output fields for VDAT
The output format for this script is optimized for use by Red Cross for VDAT (Virtual Damage Assessment Team) data processing activity. The script has three output formats including VDAT Fixed format, Free Format Casework, and Free Format General.
"VDAT Fixed Output Format" generates the same output regardless of what format the data comes in as. This means that non provided attributes will be empty on the output. This is done because in VDAT DDA procesing information comes from RcCollect and Street sheets and getting the input data format to match is all but impossible. So we fix the output format
"Free Format Casework Output Format" mimics the input format with the exception that the "cas case id" input field is first in the output format. This is done to allow VDAT personel to use V-lookups when updating CW data so the addresses do not need to be re-geocoded. Also, some additional output fields are added depending on the type of address provided
"Free Format General Output Format" mimics the input format with some additional fields depending on the type of address provided
Three types of address are supported by this script and are described below.
[Component Address] format includes "aptno", "houseno", "streetname", "city", "state" and "zip". When these are provided, the following fields are added to the output including "r_streetname, r_city, r_state to assist in fixing incorrect addresses. Additionally, we add "y", "x" and "accuracy" to show where it was geocoded and the accuracy of the location.
[Standard Address] format includes "address", "city", "state" and "zip". When these are provided, the following fields are added to the output including "r_city" and "r_state" to assist in fixing incorrect addresses. Additionally we add "y", "x", "accuracy", "houseno", and "streetname" from the geocoder
[Full Address] format a single attribute called "fulladdress". When this is provided, we add "y", "x", "accuracy", "houseno", "streetname", "city", "state", "zip" from the geocoder
The following output attributes are described in more detail below.
r_? fields are used by the script to highlight when provided address fields do not match what the geocoder returns. This could be because the provided address is incorrect or that the Geocoder result is incorrect. Determining which is correct can only be done by a human reviewer trying different methods and geocoders to find the property. For example, if the input attribute "streetname" was incorrectly called "cory", instead of the correct spelling "corry", the output value for "r_streetname" would be "cory" and "steetname" would be "corry". However, sometimes the provided address could not be located (accuracy < 7). In this case and we have values for measured_y, measured_x we reverse geocoded the address and force it to upper case (so it stands out) and place it in the r_? fields. This address field should NOT be trusted as it is unlikely to be correct.
"y" and "x" are same as latitude and longitude respectively and will be mapped. If the script detects fields called "lattitude", longitude", "lat", or "long", it will prefixed them with "orig_" to keep mapping software from using it instead of y and x.
"note" field is added to the output so the user has a field they can record information to assist them when reviewing properties in a spreadsheet. The attribute is set to "dist= ? feet" when attributes measured_y, measured_x are set and the distance between geocoded point (with accuracy RANGE_INTERPOLATED or better) and measured values exceeds the user specified limit. This value is designed to get the attention of reviewers to indicate the address geocoded might be incorrect.
"accuracy tells users the accuracy of the address. There are five possible values which include
10 ROOFTOP Address known. Lat/Long updated with geocoded result
7 RANGE_INTERPOLATED Address interpolated. Lat/Long updated with geocoded result
4 GEOMETRIC_CENTER Address unknown. Used Reverse Geocoded point for County and Zip
2 APPROXIMATE Address unknown. Used Reverse Geocoded point and have no area information
0 Nothing could be determined and have no area information. Lat/Long is zero
[measured_y, measured_x] These values are set by DA Field personnel when record an address. The location of this point should be near the property by cannot be relied to be so. Frequently DA personnel do not update the point before submitting the property or enter the address in their hotel rooms. This value is never changed during processing.
[extra_info] This attribute contains the combined values for these attributes by the script {inaccesible_cause, inaccesible_desc, inaccesible_homes_count, sfd_destroyed_desc, sfd_major_desc, sfd_minor_desc, sfd_affected_desc, sfd_destroyed_other, sfd_major_other, sfd_minor_other, sfd_affected_other, apt_destroyed_desc,apt_major_desc,apt_minor_desc,apt_affected_desc, apt_destroyed_other, apt_major_other, apt_minor_other, apt_affected_other, mh_destroyed_desc, mh_major_desc, mh_minor_desc, mh_affected_desc, mh_destroyed_other, mh_major_other, mh_minor_other, mh_affected_other}. They are therefor documentation purposes only.
How to process the output from this script
This Script is optimized to assist VDAT personal in identifying questionable addresses. In an ideal world, this would not be necessary. However, since it is not, Reviewers have to find and fix incorrect or spelled addresses. The quality of the address is determined by a Geocoder. If the address is good, it updates address attributes and the {y,x} == {Lat,Long} information. The following lists some things to look for while reviewing the data.
Open the Output file with a spreadsheet editor
Filter out zero house numbers as they cannot be fixed
Sort by Accuracy so low accuracy are at the top
Attempt to find correct address and if found, update component address fields. If you do NOT plan to run output through script again, then you need to update all component address fields along with y,x,accuracy and county manually. If you do plan run the script again, only entries with "redo" in the note field or "" in the accuracy field are redone.
Look For Accuracies < 7
Try to geocode the address in Google and Bing (Bing allows multiple searches at once which is nice)
Look for misspellings of street and city
If ?_measured fileds are present enter them in Google or Bing maps to check neighboring streets
If you fix the address, you need to add "redo" in the note column or null out the accuracy field
Review any address where r_? fields exists
look for "dist= ? feet" in note field and verify did geocoder did not find wrong address
Save file and run through script again if required.
You can run the script on the modified output as often as you need to
When you are done, you can run the script again on the modified output. By doing this, it allows you to only change address information and not have to change accuracy, y, x, county and zip fields. However, be aware of these behaviors.
Only entries with note field contains "redo" or has null accuracy are redone.
The {"r_streetname", "r_city", and "r_state"} attributes will be cleared out.
Changes to address information need to be made in the original address source. So, if you are using "Standard Address" or "Full Address", make edits there if you want to make a second pass. Otherwise, edit the Component Address fields.