ISSN 2394-5125
 

Research Article 


INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES

Deardo Dibrianto Sinaga, Seng Hansun.

Abstract
Text dataset or „corpus‟ is the main source needed for any text analysis purpose. However, there
are limited number of text corpora can be found for Indonesian Language. This paper describes the text corpus
which was used for text document similarity detection in Indonesian Language by using Rabin-Karp and
Confix-Stripping algorithms as reported in a 2018 journal publication. The corpus was divided into three
different area, i.e., Art, Medical, and Social, and has been modified by using some rules into different
sub-documents. The text dataset has been stored in a public repository that can be easily accessed and used by
other researchers for their study purposes.

Key words: Confix-Stripping; Corpus; Indonesian Language; Rabin-Karp; Text Analysis


 
ARTICLE TOOLS
Abstract
PDF Fulltext
How to cite this articleHow to cite this article
Citation Tools
Related Records
 Articles by Deardo Dibrianto Sinaga
Articles by Seng Hansun
on Google
on Google Scholar


How to Cite this Article
Pubmed Style

Deardo Dibrianto Sinaga, Seng Hansun. INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. JCR. 2020; 7(19): 6138-6142. doi:10.31838/jcr.07.19.711


Web Style

Deardo Dibrianto Sinaga, Seng Hansun. INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. http://www.jcreview.com/?mno=133758 [Access: September 14, 2020]. doi:10.31838/jcr.07.19.711


AMA (American Medical Association) Style

Deardo Dibrianto Sinaga, Seng Hansun. INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. JCR. 2020; 7(19): 6138-6142. doi:10.31838/jcr.07.19.711



Vancouver/ICMJE Style

Deardo Dibrianto Sinaga, Seng Hansun. INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. JCR. (2020), [cited September 14, 2020]; 7(19): 6138-6142. doi:10.31838/jcr.07.19.711



Harvard Style

Deardo Dibrianto Sinaga, Seng Hansun (2020) INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. JCR, 7 (19), 6138-6142. doi:10.31838/jcr.07.19.711



Turabian Style

Deardo Dibrianto Sinaga, Seng Hansun. 2020. INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. Journal of Critical Reviews, 7 (19), 6138-6142. doi:10.31838/jcr.07.19.711



Chicago Style

Deardo Dibrianto Sinaga, Seng Hansun. "INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES." Journal of Critical Reviews 7 (2020), 6138-6142. doi:10.31838/jcr.07.19.711



MLA (The Modern Language Association) Style

Deardo Dibrianto Sinaga, Seng Hansun. "INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES." Journal of Critical Reviews 7.19 (2020), 6138-6142. Print. doi:10.31838/jcr.07.19.711



APA (American Psychological Association) Style

Deardo Dibrianto Sinaga, Seng Hansun (2020) INDONESIAN LANGUAGE TEXT CORPUS FOR TEXT ANALYSIS PURPOSES. Journal of Critical Reviews, 7 (19), 6138-6142. doi:10.31838/jcr.07.19.711