Category Archives: NLP

IndoTimex for Indonesian Temporal Expressions

One question that got me thinking during the interviews with Google was, “Do you have any experience in building an NLP tool, like a tagger or a parser, for Indonesian language?”, and my answer was, “Well, ehem, not yet.”  I wonder why…

That’s why, during the last Christmas/New Year break *while waiting for the result of the interviews*, I decided to do something for Indonesian language :”>. Actually, almost the same thing I already did for Italian… building an automatic extraction system for Indonesian temporal expressions!

Extraction means recognizing time expressions given a text, then normalize their values. For example, if today’s date is March 25, 2015 (2015-03-25), then when the system found dua hari yang lalu [two days ago] the value will be normalized as 2015-03-23. I called the system: IndoTimex!

The online demo of IndoTimex is available here.

The complete system, implemented in Python, is available (for download) here.

And… since there was a conference deadline around that time, PACLING 2015 *which will be held in Bali! :D*, I submitted a paper about it and got accepted. So, to know more about the technology behind the system, please read the paper here.

If everything works fine, soon I will visit Bali (and definitely, also home) for a vacation with my family, oh, and also for the conference ;). This is what we call as an Indonesian proverb “sambil menyelam minum air”, hohoho…

Main NLP/CL 2015 Conference Deadlines

It’s a bit late, I know *I was caught up with some of these deadlines*, but here is the list of deadlines for main Natural Language Processing (NLP) or Computational Linguistics (CL) conferences in 2015. I also put the conferences’ important dates in Google Calendar, and make it publicly available at the following URLs:

The calendar’s timezone is GMT+01:00, since I’m in Italy. I couldn’t find a way to make the timezone adjustable according to your own calendar. If you have an idea please let me know.

How to subscribe to Google public calendar? here.

Conference Submission Date Notification Date Conference Date Location
NAACL 2015 Dec 5, 2014 Feb 20, 2015 Jun 1-3, 2015 Denver, Colorado
SIGIR 2015 (long paper) Jan 28, 2015 (Jan 21, 2015) Apr 20, 2015 Aug 9-13, 2015 Santiago, Chile
CICLing 2015 Feb 1, 2015 (Jan 25, 2015) Apr 14-20, 2015 Cairo, Egypt
IJCAI 2015 Feb 12, 2015 (Feb 8, 2015) Apr 16, 2015 Jul 25-31, 2015 Buenos Aires, Argentina
SIGIR 2015 (short paper) Feb 18, 2015 Apr 20, 2015 Aug 9-13, 2015 Santiago, Chile
ACL-IJCNLP 2015 (long paper) Feb 27, 2015 Apr 23, 2015 Jul 26-31, 2015 Beijing, China
*SEM 2015 Mar 6, 2015 Mar 30, 2015 Jun 4-5, 2015 Denver, Colorado
EAMT 2015 Mar 7, 2015 Mar 31, 2014 May 11-13, 2015 Antalya, Turkey
Interspeech 2015 Mar 20, 2015 Jun 1, 2015 Sep 6-10, 2015 Dresden, Germany
ACL-IJCNLP 2015 (short paper) Apr 30, 2015 Jun 8, 2015 Jul 26-31, 2015 Beijing, China
SIGDIAL 2015 Apr 30, 2015 Jun 12, 2015 Sep 2-4, 2015 Prague, Czech Republic
RANLP 2015 May 4, 2015 (Apr 27, 2015) Jun 22, 2015 Sep 7-9, 2015 Hissar, Bulgaria
CoNLL 2015 May 4, 2015 Jun 15, 2015 Jul 30-31, 2015 Beijing, China
EMNLP 2015 (long paper) May 31, 2015 Jul 24, 2015 Sep 19-21, 2015 Lisbon, Portugal
EMNLP 2015 (short paper) Jun 15, 2015 Jul 24, 2015 Sep 19-21, 2015 Lisbon, Portugal

Top NLP/CL Conferences and Journals

picture is taken from here

I got this tips from the Research Methodology course that I took last year, which is mandatory for first year students in my PhD program: one of the first steps in the PhD study is knowing your community, i.e. the conferences and journals for your field, where people with the same interests gather and share the knowledge and their research.

For Natural Language Processing or Computational Linguistics field, Google Scholar computes the ranking of publications (including conferences, journals and workshops) in which the published papers are cited the most *with the assumption of good papers are usually cited a lot*, which looks like the following:

  1. Meeting of the Association for Computational Linguistics (ACL)
  2. Conference on Empirical Methods in Natural Language Processing (EMNLP)
  3. Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
  4. International Conference on Computational Linguistics (COLING)
  5. Language Resources and Evaluation (LREC) Read more »

Main NLP/CL 2014 Conference Deadlines

picture is taken from PhD Comics

Here is the list of deadlines for main Natural Language Processing (NLP) or Computational Linguistics (CL) conferences in 2014.

Conference Submission Date Notification Date Conference Date Location
ICML 2014 (cycle 1 submission) Oct 3, 2013 Dec 9, 2013 Jun 22-24, 2014 Beijing, China
LREC 2014 Oct 15, 2013 Jan 31, 2014 May 28-30, 2014 Reykjavik, Iceland
EACL 2014 (long paper) Oct 18, 2013 Dec 20, 2013 Apr 28-30, 2014 Gothenburg, Sweden
EACL 2014 (short paper) Jan 6, 2014 Feb 24, 2014 Apr 28-30, 2014 Gothenburg, Sweden
CICLing 2014 Jan 7, 2014 (Dec 31, 2013) Jan 30, 2014 Apr 6–12, 2014 Kathmandu, Nepal
ACL 2014 (long paper) Jan 10, 2014 Mar 5, 2014 Jun 23-25, 2014 Baltimore, MD, USA
SIGIR 2014 (long paper) Jan 27, 2014 Apr 18, 2014 Jul 06-11, 2014 Gold Coast, Australia
ICML 2014 (cycle 2 submission) Jan 31, 2014 Apr 9, 2014 Jun 22-24, 2014 Beijing, China
AAAI 2014 Feb 4, 2014 (Jan 31, 2014) Apr 7, 2014 Jul 27-31, 2014 Quebec City, Canada
SIGIR 2014 (short paper) Feb 17, 2014 Apr 18, 2014 Jul 06-11, 2014 Gold Coast, Australia
SIGDIAL 2014 Mar 9, 2014 Apr 18, 2014 Jun 18-20, 2014 Philadelphia, PA, USA
ACL 2014 (short paper) Mar 12, 2014 Apr 17, 2014 Jun 23-25, 2014 Baltimore, MD, USA
CoNLL 2014 Mar 14, 2014 Apr 21, 2014 Jun 26-27, 2014 Baltimore, MD, USA
COLING 2014 Mar 21, 2014 May 23, 2014 Aug 23-30, 2014 Dublin, Ireland
EMNLP 2014 Jun 2, 2014 Jul 22, 2014 Oct 26-28, 2014 Doha, Qatar

(source: taken from here… a very very helpful site)

Read more »

Thesis… Submitted!

Finally, exactly 2 weeks before the D-Day a.k.a defense day, I managed to finish my thesis and submitted 3 bound copies to the university. Why 3 I’m not sure, one of them will go to university library, and I’m supposed to get one back after the defense, the other one… perhaps because I have co-supervisor in different university.

I really like it that we don’t have to make a hardcover of the thesis, we just have to bind the pages in this special binder, with bright orange color for faculty of computer science. Makes it easy if I need to change few things, I just need to replace the pages *which is possible to be done even after I submitted them ;p*. The price for one binder is 6 something euros, hence including the printing maybe it’s around 20 euros in total for 3 copies. Plus I need to submit 2 special stamp of 14.6 euros. In the end, it costs me quite a lot in order to graduate from here ;p.

Some crazy stories behind… Read more »

Four semesters… passed.

Meneruskan tradisi, seperti apa yang sebelumnya ditulis di sini, sana, dan situ. Waktu semester ketiga selesai kemarin entah kenapa mood untuk menulis enggan muncul, jadinya tak terdokumentasi deh ;p. Sekarang, empat semester terlewati sudah, sudah 2 tahun! Harusnya sih sudah lulus, tapi apa daya, thesis masih stuck di halaman introduction, hahahah.

Sedikit cerita tentang kampus di Bolzano… I love it more than the one in Nancy! Kenapa? Karena kalau misalnya saya keluar dari kamar berangkat jalan kaki menuju kampus, sambil mendengarkan mp3, pas satu lagu selesai saya sudah sampai di depan pintu kelas ^^. Sementara di Nancy, setengah jam naik bus menuju kampus, dan kadang terjadi tragedi terutama ketika ada demo. Jadi intinya… probabilitas saya bolos kuliah jauh lebih kecil di Bolzano ;).

Jumlah total mata kuliah yang saya ambil di semester III ada 4, antara lain:
1. Digital Library
2. XML Data Management
3. Computational Linguistics
4. Theory of Computing

Selain itu saya mengambil 2 buah project, yang satu hanya 2 credit, satu lagi 8 credit. Project 2 credit sudah saya selesaikan di semester III lalu, sementara project 8 credit baru saja minggu lalu laporannya selesai.

Mari kita bahas satu per satu mata kuliah dan dua project itu…

Read more »

Tahun pertama… usai sudah.

Pulang liburan kemarin saya masih dag-dig-dug karena email berisi hasil semester II tak kunjung muncul di inbox. Tapi lalu ada teman yang memberi link situs universitas, yang tak pernah saya buka karena dalam bahasa Perancis ;p, dan ternyata nilai-nilai sudah ter-update di sana. Alhamdulillah rata-ratanya masih bisa dibilang bagus ^^, walaupun ada satu nilai yang merusak suasana, huh.

Semester II ini tak seperti semester I kemarin, yang dibagi menjadi 2 bagian. Pada intinya… semester II ini lebih santai, hohoho. Di beberapa kelas pun muridnya hanya kami bertiga, 1st year LCT students, sementara para 2nd year students sibuk dengan thesis-nya.

Jumlah total mata kuliah di semester part II ini ada 5, ditambah 1 research project. Mata kuliahnya bisa dibilang lebih dasar dibandingkan semester I kemarin. Karena sesungguhnya memang mata kuliah di semester I kemarin aslinya milik semester III, makanya kami belajar digabung dengan 2nd year students. Inilah keterbatasan program LCT di Nancy, yang kelas internasional-nya “diada-adakan”.

Mata kuliah di semester ini antara lain:
1. Mathematics for Computer Science
2. Programming for Computational Linguistics
3. Introduction to Linguistics
4. Lexicons
5. Neural Networks

Tentang research project, proyek satu semester, kami mendapat topik tentang resep masakan :). Secara garis besar, dengan domain resep masakan, proyek ini mencakup: representasi resep dalam bentuk tree, tree clustering, dan pattern mining. Malas menjelaskan lebih detil :p, japri saja kalau tertarik lebih lanjut.

Oke, sekarang tentang mata kuliahnya, satu per satu.

Read more »

Otak Manusia… Dan Isi Otak Saya ;p

Hanya sekedar ingin mendokumentasikan isi kepala, sekaligus sebagai pengingat kenapa saya sampai terdampar di sini, kalau-kalau suatu saat terlena keasyikan jalan-jalan dan lupa kuliah… Maaf kalau terlalu berat topiknya ^^;

Sejak 2 hari yang lalu kuliah semester II sudah dimulai, total ada 5 mata kuliah, tapi sampai hari ini baru 4 yang kelasnya sudah dimulai: Mathematics for Computer Science *seperti Kalkulus Dasar + Logika Informatika terulang lagi -_-*Programming for Computational Linguistics *Python + Prolog, lagi… well, Prolog lupa-lupa ingat sih*, Neural Networks, dan Introduction to Linguistics.

Dua mata kuliah terakhir baru saja diikuti hari ini, jadi masih fresh. Dan dua-duanya menginspirasi saya untuk menulis.

Read more »

1 Semester… Done. 3 More To Go!

Hari Rabu kemarin, diiringi selesainya ujian terakhir, semester pertama part II *tentang part I bisa dibaca di sini* pun berakhir sudah. Uhuy!! 🙂

Sebenarnya total mata kuliah di semester part II ini ada 6, tapi semua sifatnya pilihan, dan hanya diperbolehkan mengambil 4 mata kuliah. Mata kuliah yang saya ambil antara lain:
1. Grammatical Formalisms
2. Corpus Linguistics
3. Discourse and Dialogue
4. Lexicology and Lexicography

Keempat-empatnya kuliah Linguistic, sengaja begitu, karena katanya nanti di Univ. Bozen-Bolzano Italia akan lebih berbasis Computer Science daripada Linguistic. Dua mata kuliah lainnya: Computational Semantics (Linguistic) dan Database and Web Technology (Computer Science). Computational Semantics… akh, I should’ve taken that course instead of Discourse & Dialogue T___T You will know the reason later…

Sekarang, tentang ujiannya!

Read more »

Ujian… Selesai.

Minggu ujian… telah usai!! Hip hip hurraaayy!!!

Dengan demikian, semester pertama part I selesai sudah. Mulai minggu depan sudah akan berkutat dengan mata kuliah baru. Yap, di sini sistemnya aneh, satu semester dibagi menjadi 2 bagian: 4 mata kuliah part I dan 4 mata kuliah part II.

Empat mata kuliah part I antara lain:
1. Logics & Statistics
2. Tools & Algorithms for NLP
3. NLP Application
4. Data Mining & Semantic Web

Logisc & Statistics dibagi menjadi 2 kuliah yang berbeda, ujiannya pun jadi ada 2. Data Mining & Semantic Web juga dibagi jadi 2 kuliah, tapi ujiannya hanya 1. Jadi total ada 5 ujian yang harus dilumpuhkan… hiyaaatt!

Read more »