ANALISA MODEL NAMED ENTITY RECOGNITION TWEET BAHASA INDONESIA

Yuda Munarko

Abstract


Dalam domain Natural Language Processing (NLP), Named Entity Recognition (NER) menjadi sub bahasan yang banyak dipelajari. Kegunaan NER adalah untuk melakukan klasifikasi terhadap kata kunci pada suatu dokumen. Selanjutnya teknik ini bisa kita terapkan pada data dari twitter untuk tujuan mengekstraksi informasi. Secara umum algoritma NER yang ada telah mampu melakukan proses klasifikasi dengan hasil yang cukup memuaskan, dengan syarat model yang digunakan dibangun berdasarkan domain klasifikasi. Oleh karena itu, implementasi pada data twitter Bahasa Indonesia didahului dengan pembangunan model berdasarkan korpus twitter Bahasa Indonesia juga. Pada penelitian ini kami melakukan analisa terhadap model twitter Bahasa Indonesia yang kami bangung menggunakan algoritma Conditional Random Field (CRF). Model yang kami bangun mampu melakukan klasifikasi dengan nilai rata-rata precision 92,8% dan recall 83,9%. Namun nilai rata-rata ini berpotensi untuk dinaikkan dengan memanfaatkan POS Tagger, dimana dalam penelitian ini digunakan algoritma Hidden Markov Model (HMM).

Full Text:

PDF

References


Smith, Craig. “By the number: 150+ Amazing Twitter Statistics. ” Diakses dari halaman web http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/, pada 15 September 2015.

LI, Y., GUAN, Y., DONG, X., & LV, X. (2013). Language Modeling for Microblog Retrieval: Combine Multiple-bernoulli Model and Temporal Prior for Tweets Rank. Journal of Computational Information Systems, 9(6), 2339-2346.

Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, P. K. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM, 10, 10-17.

Verma, S., Vieweg, S., Corvey, W. J., Palen, L., Martin, J. H., Palmer, M., ... & Anderson, K. M. (2011, May). Natural Language Processing to the Rescue? Extracting" Situational Awareness" Tweets During Mass Emergency. InICWSM.

Ishino, A., Odawara, S., Nanba, H., & Takezawa, T. (2012, October). Extracting Transportation Information and Traffic Problems from Tweets during a Disaster. In IMMM 2012, The Second International Conference on Advances in Information Mining and Management (pp. 91-96).

Endarnoto, S. K., Pradipta, S., Nugroho, A. S., & Purnama, J. (2011, July). Traffic Condition Information Extraction & Visualization from Social Media Twitter for Android Mobile Application. In Electrical Engineering and Informatics (ICEEI), 2011 International Conference on (pp. 1-4). IEEE.

Klein, Dan, et al. "Named entity recognition with character-level models."Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 2003.

Gimpel, Kevin, et al. "Part-of-speech tagging for twitter: Annotation, features, and experiments." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2. Association for Computational Linguistics, 2011.

Laboreiro, Gustavo, et al. "Tokenizing micro-blogging messages using a text classification approach." Proceedings of the fourth workshop on Analytics for noisy unstructured text data. ACM, 2010.

Foster, Jennifer, et al. "# hardtoparse: POS Tagging and Parsing the Twitterverse." proceedings of the Workshop On Analyzing Microtext (AAAI 2011). 2011.

Li, Chenliang, et al. "Twiner: named entity recognition in targeted twitter stream." Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2012.

Finin, Tim, et al. "Annotating named entities in Twitter data with crowdsourcing."Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 2010.

Locke, Brian William. "Named entity recognition: Adapting to microblogging." (2009).

Wicaksono, A. F., & Purwarianti, A. (2010). HMM Based Part-of-Speech Tagger for Bahasa Indonesia. In Proceedings of the 4th-International MALINDO Workshop (MALINDO2010).

Larasati, Septina Dian, Vladislav Kuboň, and Daniel Zeman. "Indonesian morphology tool (morphind): Towards an indonesian corpus." Systems and Frameworks for Computational Morphology. Springer Berlin Heidelberg, 2011. 119-129.

Mohamed, Hassan, Nazlia Omar, and MJ Ab Aziz. "Statistical malay part-of-speech (POS) tagger using Hidden Markov approach." Semantic Technology and Information Retrieval (STAIR), 2011 International Conference on. IEEE, 2011.

Pisceldo, F., Manurung, R., & Adriani, M. (2009). Probabilistic Part-of-Speech Tagging for Bahasa Indonesia. In The Third International MALINDO Workshop, Colocated Event ACL-IJCNLP.

Budi, Indra, et al. "Named entity recognition for the indonesian language: combining contextual, morphological and part-of-speech features into a knowledge engineering approach." Discovery Science. Springer Berlin Heidelberg, 2005.

Budi, Indra, and Stephane Bressan. "Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language." International Journal of Business Intelligence and Data Mining 2.4 (2007): 426-446.

Munarko Y.. Ekstraksi Nama Lokasi dari Tweets Informasi Lalu Lintas. Seminar Teknologi dan Rekayasa. Malang. 2015; Volume 1: 279.

Finkel, Jenny Rose, Trond Grenager, and Christopher Manning. "Incorporating non-local information into information extraction systems by gibbs sampling." Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005.

Toutanova, Kristina, and Christopher D. Manning. "Enriching the knowledge sources used in a maximum entropy part-of-speech tagger." Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13. Association for Computational Linguistics, 2000.




DOI: https://doi.org/10.22219/sentra.v0i2.1897

Refbacks

  • There are currently no refbacks.


Seketariat

Fakultas Teknik

Universitas Muhammadiyah Malang Kampus III

Jl. Raya Tlogomas 246 Malang, 65144