2017 © Pedro Peláez
 

library tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

image

sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  • Wednesday, January 21, 2015
  • by andylibrian
  • Repository
  • 3 Watchers
  • 21 Stars
  • 376 Installations
  • PHP
  • 2 Dependents
  • 1 Suggesters
  • 8 Forks
  • 1 Open issues
  • 6 Versions
  • 9 % Grown

The README.md

Sastrawi Tokenizer

Build Status Scrutinizer Code Quality Code Coverage Latest Stable Version, (*1)

Sastrawi Tokenizer adalah library PHP untuk melakukan tokenization pada Bahasa Indonesia., (*2)

Tokenization

Saya sedang belajar NLP Bahasa Indonesia.

Text di atas dapat di-tokenize menjadi:, (*3)

["Saya", "sedang", "belajar", "NLP", "Bahasa", "Indonesia", "."]

Sastrawi Tokenizer

  • Library PHP untuk melakukan tokenization pada Bahasa Indonesia.
  • Mudah diintegrasikan dengan framework / package lainnya.
  • Mempunyai API yang sederhana dan mudah digunakan.

Demo

http://sastrawi.github.io/tokenizer.html, (*4)

Cara Install

Sastrawi Tokenizer dapat diinstall dengan Composer., (*5)

  1. Buka terminal (command line) dan arahkan ke directory project Anda.
  2. Download Composer sehingga file composer.phar berada di directory tersebut.
  3. Tambahkan Sastrawi Sentence Detector ke file composer.json Anda :
php composer.phar require sastrawi/tokenizer:0.*

Jika Anda masih belum memahami bagaimana cara menggunakan Composer, silahkan baca Getting Started with Composer., (*6)

Penggunaan

Melalui kode PHP

Copy kode berikut di directory project anda. Lalu jalankan file tersebut., (*7)

<?php

// demo.php

// include composer autoloader
require_once __DIR__ . '/vendor/autoload.php';

$tokenizerFactory  = new \Sastrawi\Tokenizer\TokenizerFactory();
$tokenizer = $tokenizerFactory->createDefaultTokenizer();

$tokens = $tokenizer->tokenize('Saya membeli barang seharga Rp 5.000 di Jl. Prof. Soepomo no. 67.');

var_dump($tokens);

Melalui CLI (Command Line Interface)

Sastrawi-tokenize CLI membaca teks dari STDIN dan menulis token-tokennya ke STDOUT., (*8)

$ echo Saya sedang belajar NLP Bahasa Indonesia. | php vendor/bin/sastrawi-tokenize

Untuk menampilkan bantuan:, (*9)

$ php vendor/bin/sastrawi-tokenize --help

Lisensi

Sastrawi Tokenizer dirilis di bawah lisensi MIT License (MIT). Library ini memuat daftar singkatan Bahasa Indonesia dengan lisensi Creative Common BY SA yang bersumber dari http://id.wiktionary.org/wiki/Wiktionary:Daftar_singkatan_dan_akronim_bahasa_Indonesia., (*10)

Informasi Lebih Lanjut

The Versions

21/01 2015

dev-master

9999999-dev https://github.com/sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  Sources   Download

MIT

The Requires

 

The Development Requires

nlp natural language processing tokenizer indonesian indonesia tokenization bahasa text preprocessing

21/01 2015

v0.4.0

0.4.0.0 https://github.com/sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  Sources   Download

MIT

The Requires

 

The Development Requires

nlp natural language processing tokenizer indonesian indonesia tokenization bahasa text preprocessing

18/01 2015

v0.3.0

0.3.0.0 https://github.com/sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  Sources   Download

MIT

The Requires

 

The Development Requires

nlp natural language processing tokenizer indonesian indonesia tokenization bahasa text preprocessing

28/12 2014

v0.2.0

0.2.0.0 https://github.com/sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  Sources   Download

MIT

The Requires

 

The Development Requires

nlp natural language processing tokenizer indonesian indonesia tokenization bahasa text preprocessing

04/12 2014

dev-development

dev-development https://github.com/sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  Sources   Download

MIT

The Requires

 

The Development Requires

nlp natural language processing tokenizer indonesian indonesia tokenization bahasa text preprocessing

04/12 2014

v0.1.0

0.1.0.0 https://github.com/sastrawi/tokenizer

PHP library that allows you to tokenize Bahasa Indonesia.

  Sources   Download

MIT

The Requires

 

The Development Requires

nlp natural language processing tokenizer indonesian indonesia tokenization bahasa text preprocessing