Klemens Muthmann
Klemens Muthmann
Technical University DresdenNöthnitzer Straße 46 01062 DresdenGermany Room: 3081klemens.muthmann [at] tu-dresden.de+49 351 463 38214 Web OTMA LinkedIn profile
Detecting Near-duplicate Relations in User Generated Forum Content
A webforum is a large database of community knowledge, with information of the most recent events and developments. Unfortunately this knowledge is presented in a format easily understood by umans but not automatically by machines. However, from observing several forums for a long time it seems obvious that there are several distinct types of postings and relations between them.
One often occurring and very annoying relation between two contributions is the near-duplicate relation. In this paper we propose a work to detect and utilize contribution relations, concentrating on near-duplication. We propose ideas on how to calculate similarity, build groups of similar threads and thus make near-duplicates in forums evident. One of the core theses is, that it is possible to apply information from forum and thread structure to improve existing near-duplicate detection approaches. In addition, the proposed work shows the qualitative and quantitative results of applying such principles, thereby nding out which features are really useful in the near-duplicate detection process. Also proposed are several sample applications, which benet from forum near-duplicate detection.
| < Prev | Next > |
|---|

