Diff: STRATO-apps/wordpress_03/app/wp-content/plugins/aimogen-pro/res/rake-php-plus/README.md

Keine Baseline-Datei – Diff nur gegen leer.
Zur Liste
1 -
1 + # rake-php-plus
2 + A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE).
3 +
4 + [![Latest Stable Version](https://poser.pugx.org/donatello-za/rake-php-plus/v/stable)](https://packagist.org/packages/donatello-za/rake-php-plus)
5 + [![Total Downloads](https://poser.pugx.org/donatello-za/rake-php-plus/downloads)](https://packagist.org/packages/donatello-za/rake-php-plus)
6 + [![License](https://poser.pugx.org/donatello-za/rake-php-plus/license)](https://packagist.org/packages/donatello-za/rake-php-plus)
7 +
8 + ## Introduction
9 +
10 + Keywords describe the main topics expressed in a document/text. Keyword *extraction* in turn allows for the extraction of important words and phrases from text.
11 +
12 + Extracted keywords can be used for things like:
13 + - Building a list of useful tags out of a larger text
14 + - Building search indexes and search engines
15 + - Grouping similar content by its topic.
16 +
17 + Extracted phrases can be used for things like:
18 + - Highlighting important areas of a larger text
19 + - Language or documentation analysis
20 + - Building intelligent searches based on contextual terms
21 +
22 + This library provides an easy method for PHP developers to get a list of keywords and phrases from a string of text
23 + and is based on another smaller and unmaintained project called [RAKE-PHP](https://github.com/Richdark/RAKE-PHP) by Richard Filipčík,
24 + which is a translation from a Python implementation simply called [RAKE](https://github.com/aneesha/RAKE).
25 +
26 + > *As described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010).
27 + [Automatic Keyword Extraction from Individual Documents](https://www.researchgate.net/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents).
28 + In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.*
29 +
30 + This particular package intends to include the following benefits over the original [RAKE-PHP](https://github.com/Richdark/RAKE-PHP) package:
31 +
32 + 1. [PSR-2](http://www.php-fig.org/psr/psr-2/) coding standards.
33 + 2. [PSR-4](http://www.php-fig.org/psr/psr-4/) to be [Composer](https://getcomposer.org) installable.
34 + 3. Additional functionality such as method chaining.
35 + 4. Multiple ways to provide source stopwords.
36 + 5. Full unit test coverage.
37 + 6. Performance improvements.
38 + 7. Improved documentation.
39 + 8. Easy language integration and multibyte string support.
40 +
41 + ## Currently Supported Languages
42 +
43 + * Afrikaans (af_ZA)
44 + * Arabic (United Arab Emirates)/لإمارات العربية المتحدة (ar_AE)
45 + * Brazilian Portuguese/português do Brasil (pt_BR)
46 + * English US (en_US)
47 + * European Portuguese/português europeu (pt_PT)
48 + * French/le français (fr_FR)
49 + * German (Germany)/Deutsch (Deutschland) (de_DE)
50 + * Italian/italiano (it_IT)
51 + * Polish/język polski (pl_PL)
52 + * Russian/русский язык (ru_RU)
53 + * Sorani Kurdish/سۆرانی (ckb_IQ)
54 + * Spanish/español (es_AR)
55 + * Tamil/தமிழ் (ta_TA)
56 + * Turkish/Türkçe (tr_TR)
57 + * Persian/Farsi/فارسی (fa_IR)
58 + * Dutch/Nederlands (nl_NL)
59 + * Swedish/svenska (sv_SE)
60 +
61 + > If your language is not listed here it can be added, please see the section
62 + called **[How to add additional languages](#how-to-add-additional-languages)** at the bottom of the page.
63 +
64 + ## Version
65 +
66 + v1.0.19
67 +
68 + ## Special Thanks
69 +
70 + * [Jarosław Wasilewski](https://github.com/Orajo): Polish language and improving multi-byte support.
71 + * [Lev Morozov](https://github.com/levmorozov): French and Russian languages.
72 + * [Igor Carvalho](https://github.com/Carvlho): Brazilian Portuguese language.
73 + * [Khoshbin Ali Ahmed](https://github.com/Xoshbin): Sorani Kurdish and Arabic languages.
74 + * [RhaPT](https://github.com/RhaPT): European Portuguese language.
75 + * [Peter Thaleikis](https://github.com/spekulatius): German language.
76 + * [Yusuf Usta](https://github.com/yusufusta): Turkish language.
77 + * [orthosie](https://github.com/orthosie): Tamil language.
78 + * [ScIEnzY](https://github.com/ScIEnzY): Italian language.
79 + * [Reza Rabbani](https://github.com/thrashzone13): Persian language.
80 + * [Anne van der Aar](https://github.com/annevanderaar): Dutch language.
81 +
82 + ## Installation
83 +
84 + ### With Composer
85 +
86 + ```bash
87 + $ composer require donatello-za/rake-php-plus
88 + ```
89 +
90 +
91 + ```json
92 + {
93 + "require": {
94 + "donatello-za/rake-php-plus": "^1.0"
95 + }
96 + }
97 + ```
98 +
99 + ```php
100 + <?php
101 + require 'vendor/autoload.php';
102 +
103 + use DonatelloZa\RakePlus\RakePlus;
104 + ```
105 +
106 + ### Without Composer
107 +
108 + ```php
109 + <?php
110 +
111 + require 'path/to/AbstractStopwordProvider.php';
112 + require 'path/to/ILangParseOptions.php';
113 + require 'path/to/LangParseOptions.php';
114 + require 'path/to/StopwordArray.php';
115 + require 'path/to/StopwordsPatternFile.php';
116 + require 'path/to/StopwordsPHP.php';
117 + require 'path/to/RakePlus.php';
118 +
119 + use DonatelloZa\RakePlus\RakePlus;
120 +
121 + ```
122 +
123 + ## Example 1
124 +
125 + Creates a new instance of RakePlus, extract the phrases and return the results. Assumes that the specified
126 + text is English (US).
127 +
128 +
129 + ```php
130 + use DonatelloZa\RakePlus\RakePlus;
131 +
132 + $text = "Criteria of compatibility of a system of linear Diophantine equations, " .
133 + "strict inequations, and nonstrict inequations are considered. Upper bounds " .
134 + "for components of a minimal set of solutions and algorithms of construction " .
135 + "of minimal generating sets of solutions for all types of systems are given.";
136 +
137 + $phrases = RakePlus::create($text)->get();
138 +
139 + print_r($phrases);
140 + ```
141 +
142 + ```
143 + Array
144 + (
145 + [0] => criteria
146 + [1] => compatibility
147 + [2] => system
148 + [3] => linear diophantine equations
149 + [4] => strict inequations
150 + [5] => nonstrict inequations
151 + [6] => considered
152 + [7] => upper bounds
153 + [8] => components
154 + [9] => minimal set
155 + [10] => solutions
156 + [11] => algorithms
157 + [12] => construction
158 + [13] => minimal generating sets
159 + [14] => types
160 + [15] => systems
161 + )
162 + ```
163 +
164 + ## Example 2
165 +
166 + Creates a new instance of RakePlus, extract the phrases in different orders
167 + and also shows how to get the phrase scores.
168 +
169 + ```php
170 + use DonatelloZa\RakePlus\RakePlus;
171 +
172 + $text = "Criteria of compatibility of a system of linear Diophantine equations, " .
173 + "strict inequations, and nonstrict inequations are considered. Upper bounds " .
174 + "for components of a minimal set of solutions and algorithms of construction " .
175 + "of minimal generating sets of solutions for all types of systems are given.";
176 +
177 + // Note: en_US is the default language.
178 + $rake = RakePlus::create($text, 'en_US');
179 +
180 + // 'asc' is optional and is the default sort order
181 + $phrases = $rake->sort('asc')->get();
182 + print_r($phrases);
183 + ```
184 +
185 + ```
186 + Array
187 + (
188 + [0] => algorithms
189 + [1] => compatibility
190 + [2] => components
191 + [3] => considered
192 + [4] => construction
193 + [5] => criteria
194 + [6] => linear diophantine equations
195 + [7] => minimal generating sets
196 + [8] => minimal set
197 + [9] => nonstrict inequations
198 + [10] => solutions
199 + [11] => strict inequations
200 + [12] => system
201 + [13] => systems
202 + [14] => types
203 + [15] => upper bounds
204 + )
205 + ```
206 +
207 + ```php
208 + // Sort in descending order
209 + $phrases = $rake->sort('desc')->get();
210 + print_r($phrases);
211 + ```
212 +
213 + ```
214 + Array
215 + (
216 + [0] => upper bounds
217 + [1] => types
218 + [2] => systems
219 + [3] => system
220 + [4] => strict inequations
221 + [5] => solutions
222 + [6] => nonstrict inequations
223 + [7] => minimal set
224 + [8] => minimal generating sets
225 + [9] => linear diophantine equations
226 + [10] => criteria
227 + [11] => construction
228 + [12] => considered
229 + [13] => components
230 + [14] => compatibility
231 + [15] => algorithms
232 + )
233 + ```
234 +
235 + ```php
236 + // Sort the phrases by score and return the scores
237 + $phrase_scores = $rake->sortByScore('desc')->scores();
238 + print_r($phrase_scores);
239 + ```
240 +
241 + ```
242 + Array
243 + (
244 + [linear diophantine equations] => 9
245 + [minimal generating sets] => 8.5
246 + [minimal set] => 4.5
247 + [strict inequations] => 4
248 + [nonstrict inequations] => 4
249 + [upper bounds] => 4
250 + [criteria] => 1
251 + [compatibility] => 1
252 + [system] => 1
253 + [considered] => 1
254 + [components] => 1
255 + [solutions] => 1
256 + [algorithms] => 1
257 + [construction] => 1
258 + [types] => 1
259 + [systems] => 1
260 + )
261 + ```
262 +
263 + ```php
264 + // Extract phrases from a new string on the same RakePlus instance. Using the
265 + // same RakePlus instance is faster than creating a new instance as the
266 + // language files do not have to be re-loaded and parsed.
267 +
268 + $text = "A fast Fourier transform (FFT) algorithm computes...";
269 + $phrases = $rake->extract($text)->sort()->get();
270 + print_r($phrases);
271 + ```
272 +
273 + ```
274 + Array
275 + (
276 + [0] => algorithm computes
277 + [1] => fast fourier transform
278 + [2] => fft
279 + )
280 + ```
281 +
282 + ## Example 3
283 +
284 + Creates a new instance of RakePlus and extract the unique keywords from the phrases.
285 +
286 + ```php
287 + use DonatelloZa\RakePlus\RakePlus;
288 +
289 + $text = "Criteria of compatibility of a system of linear Diophantine equations, " .
290 + "strict inequations, and nonstrict inequations are considered. Upper bounds " .
291 + "for components of a minimal set of solutions and algorithms of construction " .
292 + "of minimal generating sets of solutions for all types of systems are given.";
293 +
294 + $keywords = RakePlus::create($text)->keywords();
295 + print_r($keywords);
296 + ```
297 +
298 + ```
299 + Array
300 + (
301 + [0] => criteria
302 + [1] => compatibility
303 + [2] => system
304 + [3] => linear
305 + [4] => diophantine
306 + [5] => equations
307 + [6] => strict
308 + [7] => inequations
309 + [8] => nonstrict
310 + [9] => considered
311 + [10] => upper
312 + [11] => bounds
313 + [12] => components
314 + [13] => minimal
315 + [14] => set
316 + [15] => solutions
317 + [16] => algorithms
318 + [17] => construction
319 + [18] => generating
320 + [19] => sets
321 + [20] => types
322 + [21] => systems
323 + )
324 + ```
325 +
326 + ## Example 4
327 +
328 + Creates a new instance of RakePlus without using the static RakePlus::create method.
329 +
330 + ```php
331 + use DonatelloZa\RakePlus;
332 +
333 + $text = "Criteria of compatibility of a system of linear Diophantine equations, " .
334 + "strict inequations, and nonstrict inequations are considered. Upper bounds " .
335 + "for components of a minimal set of solutions and algorithms of construction " .
336 + "of minimal generating sets of solutions for all types of systems are given.";
337 +
338 + $rake = new RakePlus();
339 + $phrases = $rake->extract()->get();
340 +
341 + // Alternative method:
342 + $phrases = (new RakePlus($text))->get();
343 + ```
344 +
345 + ## Example 5
346 +
347 + You can provide custom stopwords in four different ways:
348 +
349 + ```php
350 + use DonatelloZa\RakePlus\RakePlus;
351 +
352 + // 1: The standard way (provide a language code)
353 + // RakePlus will first look for ./lang/en_US.pattern, if
354 + // not found, it will look for ./lang/en_US.php.
355 + $rake = RakePlus::create($text, 'en_US');
356 +
357 + // 2: Pass an array containing stopwords
358 + $rake = RakePlus::create($text, ['a', 'able', 'about', 'above', ...]);
359 +
360 + // 3: Pass the name of a PHP or pattern file,
361 + // see lang/en_US.php and lang/en_US.pattern for examples.
362 + $rake = RakePlus::create($text, '/path/to/my/stopwords.pattern');
363 +
364 + // 4: Create an instance of one of the stopword provider classes (or
365 + // create your own) and pass that to RakePlus:
366 + $stopwords = StopwordArray::create(['a', 'able', 'about', 'above', ...]);
367 + $rake = RakePlus::create($text, $stopwords);
368 + ```
369 +
370 + ## Example 6
371 +
372 + You can specify the minimum number of characters that a phrase\keyword
373 + must be and if less than the minimum it will be filtered out. The
374 + default is 0 (no minimum).
375 +
376 + ```php
377 + use DonatelloZa\RakePlus\RakePlus;
378 +
379 + $text = '6462 Little Crest Suite, 413 Lake Carlietown, WA 12643';
380 +
381 + // Without a minimum
382 + $phrases = RakePlus::create($text, 'en_US', 0)->get();
383 + print_r($phrases);
384 + ```
385 +
386 + ```
387 + Array
388 + (
389 + [0] => crest suite
390 + [1] => 413 lake carlietown
391 + [2] => wa 12643
392 + )
393 + ```
394 +
395 + ```php
396 + // With a minimum
397 + $phrases = RakePlus::create($text, 'en_US', 10)->get();
398 +
399 + print_r($phrases);
400 + ```
401 +
402 + ```
403 + Array
404 + (
405 + [0] => crest suite
406 + [1] => 413 lake carlietown
407 + )
408 + ```
409 +
410 + ## Example 7
411 +
412 + You can specify whether phrases\keywords that consists of a numeric
413 + number only should be filtered out or not. The default is to filter out
414 + numerics.
415 +
416 + ```php
417 + use DonatelloZa\RakePlus\RakePlus;
418 +
419 + $text = '6462 Little Crest Suite, 413 Lake Carlietown, WA 12643';
420 +
421 + // Filter out numerics
422 + $phrases = RakePlus::create($text, 'en_US', 0, true)->get();
423 + print_r($phrases);
424 + ```
425 +
426 + ```Array
427 + (
428 + [0] => crest suite
429 + [1] => 413 lake carlietown
430 + [2] => wa 12643
431 + )
432 + ```
433 +
434 + ```php
435 + // Do not filter out numerics
436 + $phrases = RakePlus::create($text, 'en_US', 0, false)->get();
437 +
438 + print_r($phrases);
439 + ```
440 +
441 + ```
442 + Array
443 + (
444 + [0] => 6462
445 + [1] => crest suite
446 + [2] => 413 lake carlietown
447 + [3] => wa 12643
448 + )
449 + ```
450 +
451 + ## How to add additional languages
452 +
453 + The library requires a list of "stopwords" for each language. Stopwords are common words used in a language such as "and", "are", "or", etc.
454 +
455 + There are [stopwords for 50 languages](https://github.com/Donatello-za/stopwords-json#languages) (including the ones already supported) available in JSON format.
456 + If you are lucky enough to have your language listed then you can easily import it into the library. To
457 + do so, read the section below:
458 +
459 + **Using the stopwords extractor tool**
460 +
461 + > Note: These instructions assumes you are using Linux
462 +
463 + We will be using the Greek language as an example:
464 +
465 + 1. Check to see if your operating have the Greek localisation files, the Greek locale
466 + code you have to look for is: `el_GR`. So run the command `$ locale -a` to see if it is listed.
467 + 2. If it is not listed, you'll need to create it, so run:
468 +
469 + ```sh
470 + sudo locale-gen el_GR
471 + sudo locale-gen el_GR.utf8
472 + ```
473 +
474 + 3. Go the [list of stopword files](https://github.com/Donatello-za/stopwords-json#languages) and
475 + find the Greek language, the file will be called `el.json` and it will contain 75 stopwords.
476 + 4. Download the `el.json` file and store it somewhere on your system.
477 + 5. In you terminal, go to the directory of the `rake-php-plus` library, it will
478 + be under `vendor/donatello-za/rake-php-plus` if you used Composer to install it.
479 +
480 + We now need to use the JSON file to create two new files, one will be a `.php` file
481 + that contains the stopwords as a PHP array and one fill be a `.pattern` file which
482 + is a text file containing the stopwords as a regular expression:
483 +
484 + 1. Extract and convert the .json file to a PHP file by running:
485 +
486 + ```sh
487 + $ php ./console/extractor.php path/to/el.json --locale=el_GR --output=php > ./some/dir/el_GR.php
488 + ```
489 +
490 + 2. Extract and convert the .json file to a .pattern file by running:
491 +
492 + ```sh
493 + $ php ./console/extractor.php path/to/el.json --locale=el_GR --output=pattern > ./some/dir/el_GR.pattern
494 + ```
495 +
496 + That is it! You can now use the new stopwords by specifying it when creating an instance
497 + of the RakePlus class, for example:
498 +
499 + ```php
500 + $rake = RakePlus::create($text, '/some/dir/el_GR.pattern');
501 + ```
502 +
503 + or
504 +
505 + ```php
506 + $rake = RakePlus::create($text, '/some/dir/el_GR.php');
507 + ```
508 +
509 + **Contribute by Adding a Language**
510 +
511 + If you want your language to be officially support, you can fork this library,
512 + generate the `.pattern` and `.php` stopword files as described above, place it
513 + in the `./rake-php-plus/lang/` directory and submit it as a pull request.
514 +
515 + Once your language is officially supported, you'll be able to specify the language
516 + without having to specify the file to use, for example:
517 +
518 + ```php
519 + $rake = RakePlus::create($text, 'el_GR');
520 + ```
521 +
522 + RakePHP will always look for a `.pattern` file first and if not found it will
523 + look for a `.php` file in the `./lang/` directory.
524 +
525 + **I don't have a stopwords file for my language, what now?**
526 +
527 + If your language is not covered in the [list of 50 languages here](https://github.com/Donatello-za/stopwords-json#languages)
528 + you may have to try and find it elsewhere, try searching for "yourlanguage stopwords". If you
529 + find a list or decide to create your own list, you can also just place it in a standard text
530 + file instead of a .json file and extract the stopwords using the extractor tool, for
531 + example:
532 +
533 + ```sh
534 + $ php ./console/extractor.php path/to/mystopwords.txt --locale=LOCAL_CODE --output=php > ./some/dir/LOCAL_CODE.php
535 + $ php ./console/extractor.php path/to/mystopwords.txt --locale=LOCAL_CODE --output=php > ./some/dir/LOCAL_CODE.php
536 + ```
537 +
538 + *Remember to replace `LOCAL_CODE` for the correct local you wish to use.*
539 +
540 + Here is an example text file containing stopwords that was copied and pasted from a
541 + site: [stopwords_en_US](./console/stopwords_en_US.txt)
542 +
543 + ## To run tests
544 +
545 + Unit testing is performed using PHPUnit v11.2 running on PHP v8.3.0+.
546 +
547 + `./vendor/bin/phpunit tests`
548 +
549 + ## License
550 +
551 + Released under MIT license (read LICENSE).
552 +