• tetris11@feddit.uk
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    13 hours ago

    Genuine Question:

    if you could split the month names into 3, how would you split them to maximise their choice overlap?

    • “em” is a good overlap for nov/sept/dec
    • “uar” is good for jan/febr
      • tetris11@feddit.uk
        link
        fedilink
        English
        arrow-up
        12
        ·
        edit-2
        11 hours ago

        hierarchical letter clustering would be my guess, or graph-based clustering using ngrams of 2-4 as nodes and maximising for connections.

        Or using an optimized Regex and printing out the DFA?

        Edit: Quick N-gram analysis (min=3, max=num letters in that month)

        R-code
        library(ngram)
        
        tmonths = c("january", "february", "march",
                   "april", "may", "june", "july",
                   "august", "september", "october",
                   "november", "december")
        
        zzz = lapply(tmonths, function(mon){
          ng = ngram::ngram_asweka(paste(unlist(strsplit(mon, split="")), collapse=" "), min=3, max=nchar(mon))
          return(gsub(" ", "", ng))
        })
        res = sort(table(unlist(zzz)))
        res[res > 1]
        

        This gives the following 9 ngram frequencies greater than 1:

          ary   uar  uary   emb  embe ember   mbe  mber   ber 
            2     2     2     3     3     3     3     3     4 
        

        As you can see two longest most common motifs are “em-ber” and “uar-y”

        Using this I propose the following graph

        Mermaid
        stateDiagram
            direction LR
            sept --> em
            nov --> em
            dec --> em
            em --> ber
            oc --> to
            to --> ber
            feb --> uar
            uar --> y
            jan --> uar
            ju --> ne
            ju --> l
            l --> y
            ma --> r
            ma --> y
            r --> ch
            
            a --> p 
            p --> r
            r --> il
            a --> u
            u --> gust
        
        

          • tetris11@feddit.uk
            link
            fedilink
            English
            arrow-up
            1
            ·
            11 hours ago

            I’m really disappointed by June, April and August. Without these months, everything would be so neat and orderly

        • tetris11@feddit.uk
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          11 hours ago

          Interestingly

          • Aprch
          • Maril

          are the only two hallucinations, everything else is always a legit month