キチェ語のテキストを何点か用意したのでマルコフ連鎖を用いたキチェ語の文章作成を試みた。
ここでは4点紹介する。まずはグアテマラ教育省の絵本数冊の文章をテキストとして採り入れた場合。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Sentence 1: Pix pix, pix tz'ikin rumal kenumik. Sentence 2: ¡In ri' ra'nan! Sib'alaj maltyox at am, xattob'an che jun. Sentence 3: Ri winaq xkimajij utikik che'. Sentence 4: Ri kinan kitat xub'ij jun saqb'in tajin kutij kichapik le alaj kuk. Sentence 5: wachanim xa oxlajuj chik k'olik. Sentence 6: Wachanim xa kajib' chik k'olik. Sentence 7: B'e, b'e, b'e kacha' jun oqxa'n. Sentence 8: Ri che’ k’o jun me’s chuchi’ ri jastaq rech. Sentence 9: Ri inlaj ala ¿Jas le alaj taq ak', le jun riqow ib', k'ate k'uri' sib'alaj kaki'kotik, aq'ab' xwa'lijik, xa rumal xutaq uloq we xtijtaj jastaq rumal man xk'iy taj. Sentence 10: Jun nim tata’ xub’ij che le wachi'l. |
次にマイクロソフトのサービス・アグリーメント。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Sentence 1: Xuquje', wene' juntaq tz'ib'wuj chik ri q'atem eche che taq patanib'al riqom apan qumal uj ruk'a'm rib' ruk' ri toq'inik ri'. Sentence 2: We je ka'el cho qawach chi k'o ta k'o apanoq chupam ri Pataninik kech taq b'anowem ri Pataninik ri Pataninik tajin kchakux na (versi? Sentence 3: Arechi k'o jun urox, rumal Microsoft( jas kub'an Microsoft Studios, kojkemchak, pataninik xuquje' salob'em wachib'al sik'inik on upam ri patanib'al on B'ixkil ri Ech lal, jas ta apanoq ri Pataninik rech le software xuquje' uqasaxik xuquje' che jun tojom uperaj ri patanib'al, on ri k'olib'al pa nimk'at uchapom rib' ruk' ri b'ixkil chi uxe' taq k'utunik on wene' kuk'exla' rib' jas ri kk'ulmatajik arechi' tz'aptal ri k'olib'al lal pa http://support. Sentence 4: We man kk'am lal apan on kkoj lal k'ya taq uwach toq'inik ri Windows, k'ate k'uri' ri jak'ayb'al arechi' b'im chi uxe' taq Urox. Sentence 5: Ri ilob'al rech Microsoft, k'olib'al lal pa Skype on che uk'amik taq le wokaj la on kajach apanoq chi upam taq uwach che jun tojom uperaj ri Microsoft chisaq uya'om b'e lal ka'ok ta lal kk'amwaj lal ri k'olib'al pa Skype xuquje', we tajin kkoj lal apan ri b'i'aj jas jun b'ixkil (jas ta ne', ujachik q'ojom ri q'atem eche che taq yuk'unem. Sentence 6: Tza eqle'n lal kiPataninik taq ri Pataninik; xuquje' ri Pataninik on nuk'um ch'ich'; wuqub'. Sentence 7: We je ri' wene' man kutoq'aj taj kq'axej lal cho taqanik" jawje chi' ka'el wi jas ri k'olib'al k'o jas jun che uq'echik we taqanik. Sentence 8: We je ri' k'o upatan. Sentence 9: Tzij b'anom che taq ri Jak'ayb'al rech Windows. Sentence 10: Eqle'n lal uya'ik apan ronojel ri Upam Ech lal man kuq'ech ta apachike jachanik che Microsoft man rumal ta wujil che juntaq tz'ib'wuj on uwach chik ri chakub'al on che ya'tal rumal ri ya'om b'i lal jun nim apan k'ex che, kuya' kaqatoq'ij apan chech ukojik le patanib'al rech Office, ri yuk'unem, k'o ch'ich'am eche lal kuya' ta apan kumal nantat che okem che juperaj on ronojel taq pataninik ya'om kumal jub'an chik che taq rachi'l chak ri nab'e kanoq xuquje' che ri Pataninik wene' ktoq'ix lal rech Microsoft, we je kelik q'echonik kb'an lal. |
マイクロソフト関連の単語が見られる。次に聖書の一部分(フィリピ人への手紙)から。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Sentence 1: coj, chijiquiba' ba' u k'ij sak! Je chbanok. Sentence 2: ch ajchac, chebato' ri ixokib ri'. Sentence 3: yo'w nu chuk'ab. Sentence 4: Xane' are quinwaj quinwil i tom, xukuje' ri c'olem pa ri Crist Jesus. Sentence 5: Sibalaj quinquicot pa ri mebayil xukuje' ri nu tzij ri i ch'uch'ujil chetamax cumal conojel ri winak. Sentence 6: Ri in quinquicotic pune' man ix cuininak ta che choman pa ri Ajawaxel. Sentence 7: Weta'm chi xa' rech c? Sentence 8: ch ajchac, chebato' ri ixokib ri'. Sentence 9: chajin na ri Ajawaxel, lok'alaj tak wachalal, ri sibalaj quinwaj quinwil i w? Sentence 10: C'o jun jasach mixoc wi il, xane' xuwi ri nu tzij re colobal ib, aretak xinito' pa ri wuj re c'aslemal. |
スペル等が現在一般的に用いられているキチェ語のものと異なる。最後にキチェ族の聖典ポポル・ブフの一部から。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Sentence 1: Ta xkich'ik k'ut e nab'e uchan: maja' b'i oq jun winaq, jun chikop, nima chikop ri ulew xuya' o Alom, K'ajolom; xutzininaq chik xek'oje wi ri kaj, ulew! Mata k'ut uq'ijilab'al, uq'ala'ib'al ri kej, tz'ikin; rumal Tz'aqol, B'itol; Uchuch, Uqajaw k'aslem, Winaqirem: Ab'anel, K'uxlanel; Alayrech, K'uxlay rech saqil amaqil; saqil al, saqil k' a'm. Sentence 2: Xecha' k'ut. Sentence 3: Ta xe'uchax chi k'oje chajal k' a'm. Sentence 4: Xecha' k'ut. Sentence 5: Lib'aj chi' 8 xwinaqirik: k' akalolinik, katolona puch upa kaj, upa ulew; kaj k' o wi. Sentence 6: Ta xkitzijoj ronojel a ta chawaxoq, ta chi k'is tz'uq ronojel a ta xq'alaj, ta xna'ojixik saq petenaq ch'aqa palo utzijoxik qamujib'al, ilb'al re Popa Wuj , ilb'al saq petenaq ch'aqa palo utzijoxik puch ewaxib'al, saqirib' al rumal Tz' aqol, B'itol, Alom, K'ajolom kib'i'. Sentence 7: E nimaq etamanel, e nimaq etamanel, e nab'e xkinojij, xkitzijoj puch; jusuk' xwinaqir ulew, juyub'-taq'aj; 10 xch'ob'och'ox ub'e ja': xb'inije'ik k'ole je raqan xo'l taq juyub' xa kachamanik, katz'ininik chi q'equ'm, chi aq'ab'. Sentence 8: Nim upe' oxik, utzijoxik puch, xa ewal uwach ilol re, b'isol re. Sentence 9: xchiqelesaj rumal maja b'i chik kiq'ij; xraj k'u kitij chik kiq'ij; xraj kitijtob'ej chik, xraj kitijtob'ej chik, Raxa Kaqulja; rox chik, Raxa Kaqulja; rox chik, xraj pu kinuk' chik ilb'al re ki' chelaj ronojel uwinaqil juyub': ri kej, tz'ikin; rumal Tz'aqol, B'itol, Alom, K'ajolom; xutzininaq chik ronojel ruk' xkib'an chik chi lolinik, ma xnawachir wi k'ut, xa remanik ja', xa li' anik palo, xa utukel ri Tz' aqol, B'itol, rnawi mixutzinik, mawi mixixch' awik. Sentence 10: Lib'aj chi' 8 xwinaqirik: k' o wi. |
コードは下記の通り。
MarkovChain.rb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
def paragraphSplit(text, words) temp = "" input = "" open(text) do |f| f.each do |line| temp << line end end input << temp sentence = input.scan(/[^.?!]*./) sentence.each do |s| fragments = s.split count = 0 fragments.each do |word| if count == 0 then word = "%START%" + word end words.push(word) count += 1 end end end def writeASentence(markov, newSentence) count = 0 suffix = "" newSentence = "" while count < 100 if newSentence == "" then startCandidates = [] candidatesCount = 0 markov.each do |a, b, c| if a.include?("%START%") then startCandidates << [a, b, c] candidatesCount += 1 end end r = rand(candidatesCount) a = startCandidates[r][0] b = startCandidates[r][1] c = startCandidates[r][2] newSentence = a + " " + b + " " + c suffix = c count += 1 else rowCount = 0 candidates = [] markov.each do |a, b, c| if suffix == a then; candidates << [a, b, c] rowCount += 1 end end r = rand(rowCount) b = candidates[r][1] c = candidates[r][2] newSentence += " " + b + " " + c suffix = c count += 1 end if suffix.include?("%END%") then newSentence.gsub!("%START%", "") newSentence.gsub!("%END%", "") return newSentence break end end end def markovDic(words, markov) unless words.size < 3 for i in 0..words.size - 2 do next if words[i].include?(".") or words[i].include?("?") if words[i+2] == nil or words[i+1].include?(".") or words[i+1].include?("?") then markov << [words[i], words[i+1], "%END%"] elsif words[i+2].include?(".") or words[i+2].include?("?") then markov << [words[i], words[i+1], words[i+2] + "%END%"] else markov << [words[i], words[i+1], words[i+2]] end end end end |
MarkovChain.rb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
require '.\MarkovChain' words = [] markov = [] #paragraphSplit('./ALMG.txt', words) #paragraphSplit('./MINEDUC.txt', words) #paragraphSplit('./MSKiche.txt', words) #paragraphSplit('./AjPilipsib4.txt', words) paragraphSplit('./PopWuj.txt', words) markovDic(words, markov) count = 1 while count <=10 do sentence = writeASentence(markov, sentence) puts ("Sentence " + count.to_s + ": ") puts sentence puts"" count += 1 end |