EntroMap: Detecting Spatially Varying Multivariate Relationships

EntropMap is a new approach that can detect the existence of multivariate relationships without assuming a prior relationship form. Existing local spatial analysis methods often assume a relationship form (e.g., a linear regression model) for all regions and focus on the change in parameter values over the geographic space.  The local entropy map calculates an approximation of the Rényi entropy for the multivariate data in each local region (in the geographic space). Each local entropy value is then converted to a p-value by comparing to a distribution of permutation entropy values for the same region. All p-values (one for each local region) are processed by  statistical tests to control the multiple-testing problem. Finally, the testing results are mapped and allow analysts to locate and interactively examine significant local relationships. 

Keywords: Local Analysis, Entropy, Minimum Spanning Tree, Scan Statistics, Visual Analytics, Spatial Data Mining

Related Publication: 
  • Guo, D. (2010). "Local Entropy Map: A Non-Parametric Approach to Discover Spatially Varying Multivariate Relationships", International Journal of Geographical Information Science, 24 (9), pages 1367-1389. [Manuscript]  [Journal Link]
  • Software manual will be available soon. Users can read the REDCAP manual to learn how to prepare data files for EntroMap.

Java is needed to run the following software. You can verify if Java is already installed on your computer at this link: http://java.com/en/download/installed.jsp.

AttachmentSize
entromap.jar1.33 MB
election_data.zip3.6 MB
synthetic_data.zip150.36 KB

Comments

Tips for Using EntroMap

Due to the "curse of dimensionality", good relationships become less likely if more variables are involved. It is recommended that only a small number (e.g., 1 - 3) of independent variables be carefully selected and used in analysis. EntroMap is not a feature selection method, meaning that it cannot automatically ignore irrelevant variables.

More variables will need a slightly larger K value (i.e., the size of the neighborhood) due to the "curse of dimensionality" again. In other words, to faithfully represent a relationship, more points are needed if more variables are involved. For example, if there is only one independent variable, then K=35 or 50 would be enough. If 3 or 5 variables are involved, the K should be around 75 or even higher.  See the IJGIS paper for more information.

The materials distributed on this website since 2008 are based upon work partially supported by the National Science Foundation under Grant No. 0748813. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).