• tetris11@feddit.uk
      link
      fedilink
      English
      arrow-up
      12
      ·
      edit-2
      11 hours ago

      hierarchical letter clustering would be my guess, or graph-based clustering using ngrams of 2-4 as nodes and maximising for connections.

      Or using an optimized Regex and printing out the DFA?

      Edit: Quick N-gram analysis (min=3, max=num letters in that month)

      R-code
      library(ngram)
      
      tmonths = c("january", "february", "march",
                 "april", "may", "june", "july",
                 "august", "september", "october",
                 "november", "december")
      
      zzz = lapply(tmonths, function(mon){
        ng = ngram::ngram_asweka(paste(unlist(strsplit(mon, split="")), collapse=" "), min=3, max=nchar(mon))
        return(gsub(" ", "", ng))
      })
      res = sort(table(unlist(zzz)))
      res[res > 1]
      

      This gives the following 9 ngram frequencies greater than 1:

        ary   uar  uary   emb  embe ember   mbe  mber   ber 
          2     2     2     3     3     3     3     3     4 
      

      As you can see two longest most common motifs are “em-ber” and “uar-y”

      Using this I propose the following graph

      Mermaid
      stateDiagram
          direction LR
          sept --> em
          nov --> em
          dec --> em
          em --> ber
          oc --> to
          to --> ber
          feb --> uar
          uar --> y
          jan --> uar
          ju --> ne
          ju --> l
          l --> y
          ma --> r
          ma --> y
          r --> ch
          
          a --> p 
          p --> r
          r --> il
          a --> u
          u --> gust
      
      

        • tetris11@feddit.uk
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 hours ago

          I’m really disappointed by June, April and August. Without these months, everything would be so neat and orderly

      • tetris11@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        11 hours ago

        Interestingly

        • Aprch
        • Maril

        are the only two hallucinations, everything else is always a legit month