Diff: STRATO-apps/wordpress_03/app/wp-content/plugins/aimogen-pro/res/rake-php-plus/README.md
Keine Baseline-Datei – Diff nur gegen leer.
1
-
1
+
# rake-php-plus
2
+
A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE).
3
+
4
+
[](https://packagist.org/packages/donatello-za/rake-php-plus)
5
+
[](https://packagist.org/packages/donatello-za/rake-php-plus)
6
+
[](https://packagist.org/packages/donatello-za/rake-php-plus)
7
+
8
+
## Introduction
9
+
10
+
Keywords describe the main topics expressed in a document/text. Keyword *extraction* in turn allows for the extraction of important words and phrases from text.
11
+
12
+
Extracted keywords can be used for things like:
13
+
- Building a list of useful tags out of a larger text
14
+
- Building search indexes and search engines
15
+
- Grouping similar content by its topic.
16
+
17
+
Extracted phrases can be used for things like:
18
+
- Highlighting important areas of a larger text
19
+
- Language or documentation analysis
20
+
- Building intelligent searches based on contextual terms
21
+
22
+
This library provides an easy method for PHP developers to get a list of keywords and phrases from a string of text
23
+
and is based on another smaller and unmaintained project called [RAKE-PHP](https://github.com/Richdark/RAKE-PHP) by Richard Filipčík,
24
+
which is a translation from a Python implementation simply called [RAKE](https://github.com/aneesha/RAKE).
25
+
26
+
> *As described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010).
27
+
[Automatic Keyword Extraction from Individual Documents](https://www.researchgate.net/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents).
28
+
In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.*
29
+
30
+
This particular package intends to include the following benefits over the original [RAKE-PHP](https://github.com/Richdark/RAKE-PHP) package:
31
+
32
+
1. [PSR-2](http://www.php-fig.org/psr/psr-2/) coding standards.
33
+
2. [PSR-4](http://www.php-fig.org/psr/psr-4/) to be [Composer](https://getcomposer.org) installable.
34
+
3. Additional functionality such as method chaining.
35
+
4. Multiple ways to provide source stopwords.
36
+
5. Full unit test coverage.
37
+
6. Performance improvements.
38
+
7. Improved documentation.
39
+
8. Easy language integration and multibyte string support.
40
+
41
+
## Currently Supported Languages
42
+
43
+
* Afrikaans (af_ZA)
44
+
* Arabic (United Arab Emirates)/لإمارات العربية المتحدة (ar_AE)
45
+
* Brazilian Portuguese/português do Brasil (pt_BR)
46
+
* English US (en_US)
47
+
* European Portuguese/português europeu (pt_PT)
48
+
* French/le français (fr_FR)
49
+
* German (Germany)/Deutsch (Deutschland) (de_DE)
50
+
* Italian/italiano (it_IT)
51
+
* Polish/język polski (pl_PL)
52
+
* Russian/русский язык (ru_RU)
53
+
* Sorani Kurdish/سۆرانی (ckb_IQ)
54
+
* Spanish/español (es_AR)
55
+
* Tamil/தமிழ் (ta_TA)
56
+
* Turkish/Türkçe (tr_TR)
57
+
* Persian/Farsi/فارسی (fa_IR)
58
+
* Dutch/Nederlands (nl_NL)
59
+
* Swedish/svenska (sv_SE)
60
+
61
+
> If your language is not listed here it can be added, please see the section
62
+
called **[How to add additional languages](#how-to-add-additional-languages)** at the bottom of the page.
63
+
64
+
## Version
65
+
66
+
v1.0.19
67
+
68
+
## Special Thanks
69
+
70
+
* [Jarosław Wasilewski](https://github.com/Orajo): Polish language and improving multi-byte support.
71
+
* [Lev Morozov](https://github.com/levmorozov): French and Russian languages.
72
+
* [Igor Carvalho](https://github.com/Carvlho): Brazilian Portuguese language.
73
+
* [Khoshbin Ali Ahmed](https://github.com/Xoshbin): Sorani Kurdish and Arabic languages.
74
+
* [RhaPT](https://github.com/RhaPT): European Portuguese language.
75
+
* [Peter Thaleikis](https://github.com/spekulatius): German language.
76
+
* [Yusuf Usta](https://github.com/yusufusta): Turkish language.
77
+
* [orthosie](https://github.com/orthosie): Tamil language.
78
+
* [ScIEnzY](https://github.com/ScIEnzY): Italian language.
79
+
* [Reza Rabbani](https://github.com/thrashzone13): Persian language.
80
+
* [Anne van der Aar](https://github.com/annevanderaar): Dutch language.
81
+
82
+
## Installation
83
+
84
+
### With Composer
85
+
86
+
```bash
87
+
$ composer require donatello-za/rake-php-plus
88
+
```
89
+
90
+
91
+
```json
92
+
{
93
+
"require": {
94
+
"donatello-za/rake-php-plus": "^1.0"
95
+
}
96
+
}
97
+
```
98
+
99
+
```php
100
+
<?php
101
+
require 'vendor/autoload.php';
102
+
103
+
use DonatelloZa\RakePlus\RakePlus;
104
+
```
105
+
106
+
### Without Composer
107
+
108
+
```php
109
+
<?php
110
+
111
+
require 'path/to/AbstractStopwordProvider.php';
112
+
require 'path/to/ILangParseOptions.php';
113
+
require 'path/to/LangParseOptions.php';
114
+
require 'path/to/StopwordArray.php';
115
+
require 'path/to/StopwordsPatternFile.php';
116
+
require 'path/to/StopwordsPHP.php';
117
+
require 'path/to/RakePlus.php';
118
+
119
+
use DonatelloZa\RakePlus\RakePlus;
120
+
121
+
```
122
+
123
+
## Example 1
124
+
125
+
Creates a new instance of RakePlus, extract the phrases and return the results. Assumes that the specified
126
+
text is English (US).
127
+
128
+
129
+
```php
130
+
use DonatelloZa\RakePlus\RakePlus;
131
+
132
+
$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
133
+
"strict inequations, and nonstrict inequations are considered. Upper bounds " .
134
+
"for components of a minimal set of solutions and algorithms of construction " .
135
+
"of minimal generating sets of solutions for all types of systems are given.";
136
+
137
+
$phrases = RakePlus::create($text)->get();
138
+
139
+
print_r($phrases);
140
+
```
141
+
142
+
```
143
+
Array
144
+
(
145
+
[0] => criteria
146
+
[1] => compatibility
147
+
[2] => system
148
+
[3] => linear diophantine equations
149
+
[4] => strict inequations
150
+
[5] => nonstrict inequations
151
+
[6] => considered
152
+
[7] => upper bounds
153
+
[8] => components
154
+
[9] => minimal set
155
+
[10] => solutions
156
+
[11] => algorithms
157
+
[12] => construction
158
+
[13] => minimal generating sets
159
+
[14] => types
160
+
[15] => systems
161
+
)
162
+
```
163
+
164
+
## Example 2
165
+
166
+
Creates a new instance of RakePlus, extract the phrases in different orders
167
+
and also shows how to get the phrase scores.
168
+
169
+
```php
170
+
use DonatelloZa\RakePlus\RakePlus;
171
+
172
+
$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
173
+
"strict inequations, and nonstrict inequations are considered. Upper bounds " .
174
+
"for components of a minimal set of solutions and algorithms of construction " .
175
+
"of minimal generating sets of solutions for all types of systems are given.";
176
+
177
+
// Note: en_US is the default language.
178
+
$rake = RakePlus::create($text, 'en_US');
179
+
180
+
// 'asc' is optional and is the default sort order
181
+
$phrases = $rake->sort('asc')->get();
182
+
print_r($phrases);
183
+
```
184
+
185
+
```
186
+
Array
187
+
(
188
+
[0] => algorithms
189
+
[1] => compatibility
190
+
[2] => components
191
+
[3] => considered
192
+
[4] => construction
193
+
[5] => criteria
194
+
[6] => linear diophantine equations
195
+
[7] => minimal generating sets
196
+
[8] => minimal set
197
+
[9] => nonstrict inequations
198
+
[10] => solutions
199
+
[11] => strict inequations
200
+
[12] => system
201
+
[13] => systems
202
+
[14] => types
203
+
[15] => upper bounds
204
+
)
205
+
```
206
+
207
+
```php
208
+
// Sort in descending order
209
+
$phrases = $rake->sort('desc')->get();
210
+
print_r($phrases);
211
+
```
212
+
213
+
```
214
+
Array
215
+
(
216
+
[0] => upper bounds
217
+
[1] => types
218
+
[2] => systems
219
+
[3] => system
220
+
[4] => strict inequations
221
+
[5] => solutions
222
+
[6] => nonstrict inequations
223
+
[7] => minimal set
224
+
[8] => minimal generating sets
225
+
[9] => linear diophantine equations
226
+
[10] => criteria
227
+
[11] => construction
228
+
[12] => considered
229
+
[13] => components
230
+
[14] => compatibility
231
+
[15] => algorithms
232
+
)
233
+
```
234
+
235
+
```php
236
+
// Sort the phrases by score and return the scores
237
+
$phrase_scores = $rake->sortByScore('desc')->scores();
238
+
print_r($phrase_scores);
239
+
```
240
+
241
+
```
242
+
Array
243
+
(
244
+
[linear diophantine equations] => 9
245
+
[minimal generating sets] => 8.5
246
+
[minimal set] => 4.5
247
+
[strict inequations] => 4
248
+
[nonstrict inequations] => 4
249
+
[upper bounds] => 4
250
+
[criteria] => 1
251
+
[compatibility] => 1
252
+
[system] => 1
253
+
[considered] => 1
254
+
[components] => 1
255
+
[solutions] => 1
256
+
[algorithms] => 1
257
+
[construction] => 1
258
+
[types] => 1
259
+
[systems] => 1
260
+
)
261
+
```
262
+
263
+
```php
264
+
// Extract phrases from a new string on the same RakePlus instance. Using the
265
+
// same RakePlus instance is faster than creating a new instance as the
266
+
// language files do not have to be re-loaded and parsed.
267
+
268
+
$text = "A fast Fourier transform (FFT) algorithm computes...";
269
+
$phrases = $rake->extract($text)->sort()->get();
270
+
print_r($phrases);
271
+
```
272
+
273
+
```
274
+
Array
275
+
(
276
+
[0] => algorithm computes
277
+
[1] => fast fourier transform
278
+
[2] => fft
279
+
)
280
+
```
281
+
282
+
## Example 3
283
+
284
+
Creates a new instance of RakePlus and extract the unique keywords from the phrases.
285
+
286
+
```php
287
+
use DonatelloZa\RakePlus\RakePlus;
288
+
289
+
$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
290
+
"strict inequations, and nonstrict inequations are considered. Upper bounds " .
291
+
"for components of a minimal set of solutions and algorithms of construction " .
292
+
"of minimal generating sets of solutions for all types of systems are given.";
293
+
294
+
$keywords = RakePlus::create($text)->keywords();
295
+
print_r($keywords);
296
+
```
297
+
298
+
```
299
+
Array
300
+
(
301
+
[0] => criteria
302
+
[1] => compatibility
303
+
[2] => system
304
+
[3] => linear
305
+
[4] => diophantine
306
+
[5] => equations
307
+
[6] => strict
308
+
[7] => inequations
309
+
[8] => nonstrict
310
+
[9] => considered
311
+
[10] => upper
312
+
[11] => bounds
313
+
[12] => components
314
+
[13] => minimal
315
+
[14] => set
316
+
[15] => solutions
317
+
[16] => algorithms
318
+
[17] => construction
319
+
[18] => generating
320
+
[19] => sets
321
+
[20] => types
322
+
[21] => systems
323
+
)
324
+
```
325
+
326
+
## Example 4
327
+
328
+
Creates a new instance of RakePlus without using the static RakePlus::create method.
329
+
330
+
```php
331
+
use DonatelloZa\RakePlus;
332
+
333
+
$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
334
+
"strict inequations, and nonstrict inequations are considered. Upper bounds " .
335
+
"for components of a minimal set of solutions and algorithms of construction " .
336
+
"of minimal generating sets of solutions for all types of systems are given.";
337
+
338
+
$rake = new RakePlus();
339
+
$phrases = $rake->extract()->get();
340
+
341
+
// Alternative method:
342
+
$phrases = (new RakePlus($text))->get();
343
+
```
344
+
345
+
## Example 5
346
+
347
+
You can provide custom stopwords in four different ways:
348
+
349
+
```php
350
+
use DonatelloZa\RakePlus\RakePlus;
351
+
352
+
// 1: The standard way (provide a language code)
353
+
// RakePlus will first look for ./lang/en_US.pattern, if
354
+
// not found, it will look for ./lang/en_US.php.
355
+
$rake = RakePlus::create($text, 'en_US');
356
+
357
+
// 2: Pass an array containing stopwords
358
+
$rake = RakePlus::create($text, ['a', 'able', 'about', 'above', ...]);
359
+
360
+
// 3: Pass the name of a PHP or pattern file,
361
+
// see lang/en_US.php and lang/en_US.pattern for examples.
362
+
$rake = RakePlus::create($text, '/path/to/my/stopwords.pattern');
363
+
364
+
// 4: Create an instance of one of the stopword provider classes (or
365
+
// create your own) and pass that to RakePlus:
366
+
$stopwords = StopwordArray::create(['a', 'able', 'about', 'above', ...]);
367
+
$rake = RakePlus::create($text, $stopwords);
368
+
```
369
+
370
+
## Example 6
371
+
372
+
You can specify the minimum number of characters that a phrase\keyword
373
+
must be and if less than the minimum it will be filtered out. The
374
+
default is 0 (no minimum).
375
+
376
+
```php
377
+
use DonatelloZa\RakePlus\RakePlus;
378
+
379
+
$text = '6462 Little Crest Suite, 413 Lake Carlietown, WA 12643';
380
+
381
+
// Without a minimum
382
+
$phrases = RakePlus::create($text, 'en_US', 0)->get();
383
+
print_r($phrases);
384
+
```
385
+
386
+
```
387
+
Array
388
+
(
389
+
[0] => crest suite
390
+
[1] => 413 lake carlietown
391
+
[2] => wa 12643
392
+
)
393
+
```
394
+
395
+
```php
396
+
// With a minimum
397
+
$phrases = RakePlus::create($text, 'en_US', 10)->get();
398
+
399
+
print_r($phrases);
400
+
```
401
+
402
+
```
403
+
Array
404
+
(
405
+
[0] => crest suite
406
+
[1] => 413 lake carlietown
407
+
)
408
+
```
409
+
410
+
## Example 7
411
+
412
+
You can specify whether phrases\keywords that consists of a numeric
413
+
number only should be filtered out or not. The default is to filter out
414
+
numerics.
415
+
416
+
```php
417
+
use DonatelloZa\RakePlus\RakePlus;
418
+
419
+
$text = '6462 Little Crest Suite, 413 Lake Carlietown, WA 12643';
420
+
421
+
// Filter out numerics
422
+
$phrases = RakePlus::create($text, 'en_US', 0, true)->get();
423
+
print_r($phrases);
424
+
```
425
+
426
+
```Array
427
+
(
428
+
[0] => crest suite
429
+
[1] => 413 lake carlietown
430
+
[2] => wa 12643
431
+
)
432
+
```
433
+
434
+
```php
435
+
// Do not filter out numerics
436
+
$phrases = RakePlus::create($text, 'en_US', 0, false)->get();
437
+
438
+
print_r($phrases);
439
+
```
440
+
441
+
```
442
+
Array
443
+
(
444
+
[0] => 6462
445
+
[1] => crest suite
446
+
[2] => 413 lake carlietown
447
+
[3] => wa 12643
448
+
)
449
+
```
450
+
451
+
## How to add additional languages
452
+
453
+
The library requires a list of "stopwords" for each language. Stopwords are common words used in a language such as "and", "are", "or", etc.
454
+
455
+
There are [stopwords for 50 languages](https://github.com/Donatello-za/stopwords-json#languages) (including the ones already supported) available in JSON format.
456
+
If you are lucky enough to have your language listed then you can easily import it into the library. To
457
+
do so, read the section below:
458
+
459
+
**Using the stopwords extractor tool**
460
+
461
+
> Note: These instructions assumes you are using Linux
462
+
463
+
We will be using the Greek language as an example:
464
+
465
+
1. Check to see if your operating have the Greek localisation files, the Greek locale
466
+
code you have to look for is: `el_GR`. So run the command `$ locale -a` to see if it is listed.
467
+
2. If it is not listed, you'll need to create it, so run:
468
+
469
+
```sh
470
+
sudo locale-gen el_GR
471
+
sudo locale-gen el_GR.utf8
472
+
```
473
+
474
+
3. Go the [list of stopword files](https://github.com/Donatello-za/stopwords-json#languages) and
475
+
find the Greek language, the file will be called `el.json` and it will contain 75 stopwords.
476
+
4. Download the `el.json` file and store it somewhere on your system.
477
+
5. In you terminal, go to the directory of the `rake-php-plus` library, it will
478
+
be under `vendor/donatello-za/rake-php-plus` if you used Composer to install it.
479
+
480
+
We now need to use the JSON file to create two new files, one will be a `.php` file
481
+
that contains the stopwords as a PHP array and one fill be a `.pattern` file which
482
+
is a text file containing the stopwords as a regular expression:
483
+
484
+
1. Extract and convert the .json file to a PHP file by running:
485
+
486
+
```sh
487
+
$ php ./console/extractor.php path/to/el.json --locale=el_GR --output=php > ./some/dir/el_GR.php
488
+
```
489
+
490
+
2. Extract and convert the .json file to a .pattern file by running:
491
+
492
+
```sh
493
+
$ php ./console/extractor.php path/to/el.json --locale=el_GR --output=pattern > ./some/dir/el_GR.pattern
494
+
```
495
+
496
+
That is it! You can now use the new stopwords by specifying it when creating an instance
497
+
of the RakePlus class, for example:
498
+
499
+
```php
500
+
$rake = RakePlus::create($text, '/some/dir/el_GR.pattern');
501
+
```
502
+
503
+
or
504
+
505
+
```php
506
+
$rake = RakePlus::create($text, '/some/dir/el_GR.php');
507
+
```
508
+
509
+
**Contribute by Adding a Language**
510
+
511
+
If you want your language to be officially support, you can fork this library,
512
+
generate the `.pattern` and `.php` stopword files as described above, place it
513
+
in the `./rake-php-plus/lang/` directory and submit it as a pull request.
514
+
515
+
Once your language is officially supported, you'll be able to specify the language
516
+
without having to specify the file to use, for example:
517
+
518
+
```php
519
+
$rake = RakePlus::create($text, 'el_GR');
520
+
```
521
+
522
+
RakePHP will always look for a `.pattern` file first and if not found it will
523
+
look for a `.php` file in the `./lang/` directory.
524
+
525
+
**I don't have a stopwords file for my language, what now?**
526
+
527
+
If your language is not covered in the [list of 50 languages here](https://github.com/Donatello-za/stopwords-json#languages)
528
+
you may have to try and find it elsewhere, try searching for "yourlanguage stopwords". If you
529
+
find a list or decide to create your own list, you can also just place it in a standard text
530
+
file instead of a .json file and extract the stopwords using the extractor tool, for
531
+
example:
532
+
533
+
```sh
534
+
$ php ./console/extractor.php path/to/mystopwords.txt --locale=LOCAL_CODE --output=php > ./some/dir/LOCAL_CODE.php
535
+
$ php ./console/extractor.php path/to/mystopwords.txt --locale=LOCAL_CODE --output=php > ./some/dir/LOCAL_CODE.php
536
+
```
537
+
538
+
*Remember to replace `LOCAL_CODE` for the correct local you wish to use.*
539
+
540
+
Here is an example text file containing stopwords that was copied and pasted from a
541
+
site: [stopwords_en_US](./console/stopwords_en_US.txt)
542
+
543
+
## To run tests
544
+
545
+
Unit testing is performed using PHPUnit v11.2 running on PHP v8.3.0+.
546
+
547
+
`./vendor/bin/phpunit tests`
548
+
549
+
## License
550
+
551
+
Released under MIT license (read LICENSE).
552
+