Tuesday, April 11, 2006

Some Problems With Transliteration

My Hebrew transliteration program, now in beta, still has some problems. This is because some features have not been implemented in their entirety. I'll document some of the problems here. First, I would just note that I've added support for another transcription scheme, namely "Michigan Claremont." Since this scheme does not denote many of the features I discover, the mapping loses features. For example, that scheme does not distinguish between the shevas na and nach, or between different degeshim, or between kametz gadol or katan.

The program still has a way to go. A good illustration of this is how it handles the 12th perek of Vayikra. I will give here the academic transliteration generated by my program and highlight errors in red.

wayḏabbēr ʾăḏōnāy, ʾel-mōšeh llēʾmōr
dabbēr ʾel-bənê yiśrāʾēl, lēʾmōr, ʾiššāh kî ṯazrîaʿ, wəyāləḏāh zāḵor--wəṭāməʾāh šiḇʿaṯ yāmîm, kîmê niddaṯ dəw‍ōṯāh tiṭmāʾ

zāḵor was a hypercorrection, based on the dashes there. Basically, when a word ends with a syllable with kametz and has a makef connecting it to the next word, that kametz is a katon, and has been reduced from a cholam. Here, the -- was punctuation from mechon-mamre showing a specific type of division, and this confused my program into thinking that the kametz was reduced from a cholam. I need to distinguish between a single dash for dagesh and double dash for mechon-mamre's trup-based punctuation.

ûḇayyôm, haššəmînî, yimmôl, bəśar ʿārəlāṯô

I beleive this should be a kametz katan, being the same form as chochmato. Along with that, the sheva should then be nach and elide. This is a rule to add to the kametz katan section.

ûšlōšîm yôm ûšlōšeṯ yāmîm, tēšēḇ biḏmê ṭohŏrāh; bəḵol-qōḏeš lōʾ-ṯiggāʿ, wəʾel-hammiqdāš lōʾ ṯāḇōʾ, ʿaḏ-məlōʾṯ, yəmê ṭohŏrāh
wəʾim-nəqēḇāh ṯēlēḏ, wəṭāməʾāh šəḇuʿayim kəniddāṯāh; wəšiššîm yôm wəšēšeṯ yāmîm, tēšēḇ ʿal-dəmê ṭohŏrāh
ûḇimlōʾṯ yəmê ṭohŏrāh, ləḇēn ʾô ləḇaṯ, tāḇîʾ keḇeś ben-šənāṯô ləʿōlāh, ûḇen-yônāh ʾô-ṯōr ləḥaṭṭāʾṯ--ʾel-peṯaḥ ʾōhel-môʿēḏ, ʾel-hakkōhēn
wəhiqrîḇô lip̄nê ʾăḏōnāy, wəḵipper ʿāle(y)hā, wəṭāhărāh, mimməqōr dāme(y)hā: zōʾṯ tôraṯ hayyōleḏeṯ, lazzāḵār ʾô lannəqēḇāh
wəʾim-lōʾ ṯimṣāʾ yāḏāh, dê śeh--wəlāqəḥāh šəṯDAGESH_UNKNOWNê-ṯōrîm ʾô šənê bənê yônāh, ʾeḥāḏ ləʿōlāh wəʾeḥāḏ ləḥaṭṭāʾṯ; wəḵipper ʿāle(y)hā hakkōhēn, wəṭāhērāh
END PEREK

Here the problem is the word שְׁתֵּי. The sheva seems to be na, since it appears in the beginning of a word, but if so, the dagesh after the tav should not exist. The dagesh (kal) could exist in the tav if the sheva preceding it were nach, but it should not be. This common word seems to contradict various phonological rules -- indeed, because it is a common word, it probably lasted through various historical developments in the Hebrew language. At any rate, we need to treat this as a special case - probably with a sheva nach under the shin.

Why I've been absent the past few weeks:
In general, I've been busy doing things in the real world, relating to trying to advance myself towards a PhD, preparing for classes and for Pesach, and taking care of Meir (who wants the computer for himself to "work" or watch his Baby Einstein DVDs). Hope to pick up blogging again soon.

I'll note here some noteworthy blogs which have been created recently, or which I chanced upon. Mendy has started blogging again. I'm sure you've seen this already, but S. started a new blog, English Hebraica. And I've been reading the chocolate lady's blog, אין מױל ארײן, mostly in an effort to improve my Yiddish, for my own top-secret (but really cool) purposes.

1 comment:

Mississippi Fred MacDowell said...

Thanks for the link (and having such a wonderful blog!).

Chag kasher vesameach to you. : )

LinkWithin

Blog Widget by LinkWithin