PNG IHDR ; IDATxܻn0K )(pA7LeG{ §㻢|ذaÆ 6lذaÆ 6lذaÆ 6lom$^yذag5 bÆ 6lذaÆ 6lذa{ 6lذaÆ `}HFkm,mӪôô!x|'ܢ˟;E:9&ᶒ}{v]n&6 h_tڠ͵-ҫZ;Z$.Pkž)!o>}leQfJTu іچ\X=8Rن4`Vwl>nG^is"ms$ui?wbs[m6K4O.4%/bC%tMז -lG6mrz2s%9s@-k9=)kB5\+͂ZsٲRn~GRCwIcIn7jJhۛNCS|j08yiHKֶۛkɈ+;SzL /F*\Ԕ#"5m2[S=gnaPeғL lذaÆ 6l^ḵaÆ 6lذaÆ 6lذa; _ذaÆ 6lذaÆ 6lذaÆ R IENDB` 3 ]9Y @ s0 d dl Z d dlZddlmZ G dd deZdS ) N )ProbingStatec @ sn e Zd ZdZdddZdd Zedd Zd d Zedd Z d d Z edd Zedd Z edd ZdS ) CharSetProbergffffff?Nc C s d | _ || _tjt| _d S )N)_statelang_filterloggingZ getLogger__name__Zlogger)selfr r #/usr/lib/python3.6/charsetprober.py__init__' s zCharSetProber.__init__c C s t j| _d S )N)r Z DETECTINGr )r r r r reset, s zCharSetProber.resetc C s d S )Nr )r r r r charset_name/ s zCharSetProber.charset_namec C s d S )Nr )r bufr r r feed3 s zCharSetProber.feedc C s | j S )N)r )r r r r state6 s zCharSetProber.statec C s dS )Ng r )r r r r get_confidence: s zCharSetProber.get_confidencec C s t jdd| } | S )Ns ([ -])+ )resub)r r r r filter_high_byte_only= s z#CharSetProber.filter_high_byte_onlyc C sb t }tjd| }xJ|D ]B}|j|dd |dd }|j rP|dk rPd}|j| qW |S )u9 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [-ÿ] marker: everything else [^a-zA-Z-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s% [a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?Nr r r ) bytearrayr findallextendisalpha)r filteredZwordsZwordZ last_charr r r filter_international_wordsB s z(CharSetProber.filter_international_wordsc C s t }d}d}xtt| D ]r}| ||d }|dkr>d}n|dkrJd}|dk r|j r||kr| r|j| || |jd |d }qW |s|j| |d |S ) a Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Fr r >