文章基本信息

标题：Machine Learning For Real Estate Contracts Automatic Categorization of Text
本地全文：下载
作者：IJCTC.Mani ; J.Jayasudha
期刊名称：International Journal of Computer Techniques
电子版ISSN：2394-2231
出版年度：2016
卷号：3
期号：2
页码：1-6
语种：English
出版社：International Research Group - IRG
摘要：Automatic Text Classification is a machine learning task that automatically assigns a given document to a set of pre-defined categories based on its textual content and mined features. Automatic Text Classification has important applications in content management, contextual search, estimation mining, product review analysis, spam filtering and text sentiment mining. This paper explains the generic strategy for automatic text classification and analyses existing solutions to major issues such as dealing with unstructured text, handling large number of features and selecting a machine learning technique appropriate to the text-classification application. There are statistical model, rule based model, hybrid model. Statistical model is based on training text which configured in each categories, Rule Based model is based on rules like Positive term, Negative term, Relevant term, Irrelevant term. Positive term list of mandatory terms. Negative Term list of excluding terms. Relevant Term list of relevant terms. Irrelevant Term list of irrelevant terms. Hybrid model is combination of statistical and rule based model. Hybrid model will give the accurate result. At first model will be created as statistical model to get the exact result later for fine tuning process have to add terms so at last the model will look as hybrid model. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.