Action 2.1 : Datamining

Collect additional information on historical earthquakes by deploying data-mining and text-mining approaches on the web

Motivations

Collect new information on historical earthquakes, for instance by taking advantage of the development of data-mining and text mining based on the very large amount of data available on the web. People interested in history and genealogy are numerous in Europe, and they have an important social activity on internet. They can constitute a real "task force" to find historical documents on past earthquakes, these documents being very difficult to find in a systematic investigation of national, regional, and local archives.

Program of research

  • Develop process flow to enrich macroseismic historical database
    1. Collect documents via automatic URLs crawling on the overall Web: results = crawled database
    2. Optical Character Recognition (OCR) of the crawled-database
    3. Text-Mining to filter the crawled-database by keywords
    4. Validation of the final selected documents by seismologist & historian
  • Performance testing of the process starting from selected URLs andkeywords
    1. If successful, implement on a wider scope of URLs and keywords.

Bigdata

Organization

  • Type: Research study
  • Collaboration: QWAM Content Intelligence, EDF
  • Status: In progress (started 2017)