A Data Mining Approach for Assessing Consistency between Multiple Representations in Spatial Databases

Abstract

When different spatial databases are combined, an important issue is the identification of inconsistencies between data. Quite often, representations of the same geographical entities in databases are different and reflect different points of view. In order to fully take advantage of these differences when object instances are associated, a key issue is to determine whether the differences are normal, i.e. explained by the database specifications, or if they are due to erroneous or outdated data in one database. In this paper, we propose a knowledge‐based approach to partially automate the consistency assessment between multiple representations of data. The inconsistency detection is viewed as a knowledge‐acquisition problem, the source of knowledge being the data. The consistency assessment is carried out by applying a proposed method called MECO. This method is itself parameterized by some domain knowledge obtained from a second method called MACO. MACO supports two approaches (direct or indirect) to perform the knowledge acquisition using data‐mining techniques. In particular, a supervised learning approach is defined to automate the knowledge acquisition so as to drastically reduce the human‐domain expert’s work. Thanks to this approach, the knowledge‐acquisition process is sped up and less expert‐dependent. Training examples are obtained automatically upon completion of the spatial data matching. Knowledge extraction from data following this bottom‐up approach is particularly useful, since the database specifications are generally complex, difficult to analyse, and manually encoded. Such a data‐driven process also sheds some light on the gap between textual specifications and those actually used to produce the data. The methodology is illustrated and experimentally validated by comparing geometrical representations and attribute values of different vector spatial databases. The advantages and limits of such partially automatic approaches are discussed, and some future works are suggested.

Publication
In International Journal of Geographical Information Science, 23(8), pp. 961-992
Date
Links