搜索
您的当前位置:首页Compact encoding of multi-lingual translation dict

Compact encoding of multi-lingual translation dict

时间:2021-11-26 来源:乌哈旅游
专利内容由知识产权出版社提供

专利名称:Compact encoding of multi-lingual

translation dictionaries

发明人:Ronald M. Kaplan,Atty T. Mullins申请号:US08/657229申请日:19960603公开号:US05787386A公开日:19980728

摘要:A computerized multilingual translation dictionary includes a set of word andphrases for each of the languages it contains, plus a mapping that indicates for each wordor phrase in one language what the corresponding translations in the other languagesare. The set of words and phrases for each language are divided up among

corresponding concept groups based on an abstract pivot language. The words andphrases are encoded as token numbers assigned by a word-number mapper laid out insequence that can be searched fairly rapidly with a simple linear scan. The complexassociations of words and phrases to particular pivot language senses are representedby including a list of pivot-language sense numbers with each word or phrase. Thepreferred coding of these sense numbers is by means of a bit vector for each word,where each bit corresponds to a particular pivot element in the abstract language, andthe bit is ON if the given word is a translation of that pivot element. Then, to determinewhether a word in language 1 translates to a word in language 2 only requires a bit-wiseintersection of their associated bit- vectors. Each word or phrase is prefixed by its bit-vector token number, so the bit-vector tokens do double duty: they also act as

separators between the tokens of one phrase and those of another. A pseudo-Huffman

compression scheme is used to reduce the size of the token stream. Because of thefrequency skew for the bit-vector tokens, this produces a very compact encoding.

申请人:XEROX CORPORATION

代理机构:Fay, Sharpe, Beall, Fagan, Minnich & McKee

更多信息请下载全文后查看

因篇幅问题不能全部显示,请点此查看更多更全内容

Top