Home GnuPG

Don't let stemming reduce a word beneath 3 characters
5ac2ca121489Unpublished

Unpublished Commit ยท Learn More

Repository Importing: This repository is still importing.

Description

Don't let stemming reduce a word beneath 3 characters

Summary:
Ref T11922. Porter stems "DNS" (an acronym for "Domain Name Syrup") into "dn", which is meaningless and too short to index.

Don't let stemming make an indexable token un-indexable by shortening it: if the stem is too short, just return the normalized input.

(I believe there are very few legitimate English words that have two letter roots, anyway.)

Test Plan: Added unit tests.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11922

Differential Revision: https://secure.phabricator.com/D17001

Details

Provenance
epriestley <git@epriestley.com>Authored on Dec 6 2016, 5:26 PM
Parents
rPHUTIL7009bcd3fb9b: Give PhutilClassMapQuery a public cache key
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPHUTIL5ac2ca121489: Don't let stemming reduce a word beneath 3 characters (authored by epriestley <git@epriestley.com>).Dec 6 2016, 6:10 PM