HEX

File: //opt/alt/python37/lib/python3.7/site-packages/tldextract/__pycache__/tldextract.cpython-37.pyc
B

��f`G�@s&dZddlmZddlZddlZddlZddlmZm	Z	ddl
mZddlm
Z
ddlZddlmZmZdd	lmZmZmZdd
lmZe�d�Zej�d�Zd
ZGdd�de
�ZGdd�d�Ze�Z Gdd�d�Z!ee j"�d!dddd�dd��Z#ee j$�dd��Z$Gdd�d�Z%ddd�dd �Z&dS)"aW`tldextract` accurately separates a URL's subdomain, domain, and public suffix.

It does this via the Public Suffix List (PSL).

    >>> import tldextract

    >>> tldextract.extract('http://forums.news.cnn.com/')
    ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)

    >>> tldextract.extract('http://forums.bbc.co.uk/') # United Kingdom
    ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)

    >>> tldextract.extract('http://www.worldbank.org.kg/') # Kyrgyzstan
    ExtractResult(subdomain='www', domain='worldbank', suffix='org.kg', is_private=False)

`ExtractResult` is a namedtuple, so it's simple to access the parts you want.

    >>> ext = tldextract.extract('http://forums.bbc.co.uk')
    >>> (ext.subdomain, ext.domain, ext.suffix)
    ('forums', 'bbc', 'co.uk')
    >>> # rejoin subdomain and domain
    >>> '.'.join(ext[:2])
    'forums.bbc'
    >>> # a common alias
    >>> ext.registered_domain
    'bbc.co.uk'

Note subdomain and suffix are _optional_. Not all URL-like inputs have a
subdomain or a valid suffix.

    >>> tldextract.extract('google.com')
    ExtractResult(subdomain='', domain='google', suffix='com', is_private=False)

    >>> tldextract.extract('google.notavalidsuffix')
    ExtractResult(subdomain='google', domain='notavalidsuffix', suffix='', is_private=False)

    >>> tldextract.extract('http://127.0.0.1:8080/deployed/')
    ExtractResult(subdomain='', domain='127.0.0.1', suffix='', is_private=False)

If you want to rejoin the whole namedtuple, regardless of whether a subdomain
or suffix were found:

    >>> ext = tldextract.extract('http://127.0.0.1:8080/deployed/')
    >>> # this has unwanted dots
    >>> '.'.join(part for part in ext[:3])
    '.127.0.0.1.'
    >>> # join part only if truthy
    >>> '.'.join(part for part in ext[:3] if part)
    '127.0.0.1'
�)�annotationsN)�
Collection�Sequence)�wraps)�
NamedTuple�)�	DiskCache�
get_cache_dir)�lenient_netloc�
looks_like_ip�looks_like_ipv6)�get_suffix_listsZ
tldextractZTLDEXTRACT_CACHE_TIMEOUT)z4https://publicsuffix.org/list/public_suffix_list.datzQhttps://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.datc@s~eZdZUdZded<ded<ded<dZded<edd	�d
d��Zedd	�dd
��Zedd	�dd��Z	edd	�dd��Z
dS)�
ExtractResultzgnamedtuple of a URL's subdomain, domain, suffix,
    and flag that indicates if URL has private suffix.�str�	subdomain�domain�suffixF�bool�
is_private)�returncCs"|jr|jr|j�d|j��SdS)z�
        Joins the domain and suffix fields with a dot, if they're both set.

        >>> extract('http://forums.bbc.co.uk').registered_domain
        'bbc.co.uk'
        >>> extract('http://localhost:8080').registered_domain
        ''
        �.�)rr)�self�r�F/opt/alt/python37/lib/python3.7/site-packages/tldextract/tldextract.py�registered_domainYs
zExtractResult.registered_domaincCs2|jr.|js|jr.d�dd�|dd�D��SdS)z�
        Returns a Fully Qualified Domain Name, if there is a proper domain/suffix.

        >>> extract('http://forums.bbc.co.uk/path/to/file').fqdn
        'forums.bbc.co.uk'
        >>> extract('http://localhost:8080').fqdn
        ''
        rcss|]}|r|VqdS)Nr)�.0�irrr�	<genexpr>tsz%ExtractResult.fqdn.<locals>.<genexpr>N�r)rrr�join)rrrr�fqdngs
zExtractResult.fqdncCs&|jr"|js"|js"t|j�r"|jSdS)a
        Returns the ipv4 if that is what the presented domain/url is.

        >>> extract('http://127.0.0.1/path/to/file').ipv4
        '127.0.0.1'
        >>> extract('http://127.0.0.1.1/path/to/file').ipv4
        ''
        >>> extract('http://256.1.1.1').ipv4
        ''
        r)rrrr)rrrr�ipv4ws


zExtractResult.ipv4cCsXd}t|j�|krT|jddkrT|jddkrT|jsT|jsT|jdd�}t|�rT|SdS)a�
        Returns the ipv6 if that is what the presented domain/url is.

        >>> extract('http://[aBcD:ef01:2345:6789:aBcD:ef01:127.0.0.1]/path/to/file').ipv6
        'aBcD:ef01:2345:6789:aBcD:ef01:127.0.0.1'
        >>> extract('http://[aBcD:ef01:2345:6789:aBcD:ef01:127.0.0.1.1]/path/to/file').ipv6
        ''
        >>> extract('http://[aBcD:ef01:2345:6789:aBcD:ef01:256.0.0.1]').ipv6
        ''
        �r�[����]rr)�lenrrrr)r�min_num_ipv6_charsZdebracketedrrr�ipv6�szExtractResult.ipv6N)�__name__�
__module__�__qualname__�__doc__�__annotations__r�propertyrr!r"r)rrrrrPs
rc	@s�eZdZdZe�edddefddddddd	d
�dd�Zd&dddd�dd�Zd'dddd�dd�Z	d(dddd�dd�Z
dddd�dd�Zd)dd	d�dd�Ze
dd �d!d"��Zd#d �d$d%�Zd
S)*�
TLDExtractzOA callable for extracting, subdomain, domain, and suffix components from a URL.TFrz
str | Nonez
Sequence[str]rzstr | float | None�None)�	cache_dir�suffix_list_urls�fallback_to_snapshot�include_psl_private_domains�extra_suffixes�cache_fetch_timeoutrcCsr|pd}tdd�|D��|_||_|js:|s:|js:td��||_||_d|_t|t�r^t	|�n||_
t|�|_dS)a�Construct a callable for extracting subdomain, domain, and suffix components from a URL.

        Upon calling it, it first checks for a JSON in `cache_dir`. By default,
        the `cache_dir` will live in the tldextract directory. You can disable
        the caching functionality of this module by setting `cache_dir` to `None`.

        If the cached version does not exist (such as on the first run), HTTP request the URLs in
        `suffix_list_urls` in order, until one returns public suffix list data. To disable HTTP
        requests, set this to an empty sequence.

        The default list of URLs point to the latest version of the Mozilla Public Suffix List and
        its mirror, but any similar document could be specified. Local files can be specified by
        using the `file://` protocol. (See `urllib2` documentation.)

        If there is no cached version loaded and no data is found from the `suffix_list_urls`,
        the module will fall back to the included TLD set snapshot. If you do not want
        this behavior, you may set `fallback_to_snapshot` to False, and an exception will be
        raised instead.

        The Public Suffix List includes a list of "private domains" as TLDs,
        such as blogspot.com. These do not fit `tldextract`'s definition of a
        suffix, so these domains are excluded by default. If you'd like them
        included instead, set `include_psl_private_domains` to True.

        You can pass additional suffixes in `extra_suffixes` argument without changing list URL

        cache_fetch_timeout is passed unmodified to the underlying request object
        per the requests documentation here:
        http://docs.python-requests.org/en/master/user/advanced/#timeouts

        cache_fetch_timeout can also be set to a single value with the
        environment variable TLDEXTRACT_CACHE_TIMEOUT, like so:

        TLDEXTRACT_CACHE_TIMEOUT="1.2"

        When set this way, the same timeout value will be used for both connect
        and read timeouts
        rcss|]}|��r|��VqdS)N)�strip)r�urlrrrr�sz&TLDExtract.__init__.<locals>.<genexpr>z�The arguments you have provided disable all ways for tldextract to obtain data. Please provide a suffix list data, a cache_dir, or set `fallback_to_snapshot` to `True`.N)
�tupler3r4�
ValueErrorr5r6�
_extractor�
isinstancer�floatr7r�_cache)rr2r3r4r5r6r7rrr�__init__�s/zTLDExtract.__init__Nrzbool | Noner)r9r5rcCs|�||�S)zAlias for `extract_str`.)�extract_str)rr9r5rrr�__call__�szTLDExtract.__call__cCs|�t|�|�S)a�Take a string URL and splits it into its subdomain, domain, and suffix components.

        I.e. its effective TLD, gTLD, ccTLD, etc. components.

        >>> extractor = TLDExtract()
        >>> extractor.extract_str('http://forums.news.cnn.com/')
        ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)
        >>> extractor.extract_str('http://forums.bbc.co.uk/')
        ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)
        )�_extract_netlocr
)rr9r5rrrrA�s
zTLDExtract.extract_strz3urllib.parse.ParseResult | urllib.parse.SplitResultcCs|�|j|�S)a�Take the output of urllib.parse URL parsing methods and further splits the parsed URL.

        Splits the parsed URL into its subdomain, domain, and suffix
        components, i.e. its effective TLD, gTLD, ccTLD, etc. components.

        This method is like `extract_str` but faster, as the string's domain
        name has already been parsed.

        >>> extractor = TLDExtract()
        >>> extractor.extract_urllib(urllib.parse.urlsplit('http://forums.news.cnn.com/'))
        ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)
        >>> extractor.extract_urllib(urllib.parse.urlsplit('http://forums.bbc.co.uk/'))
        ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)
        )rC�netloc)rr9r5rrr�extract_urllibszTLDExtract.extract_urllib)rDr5rcCs|�dd��dd��dd�}d}t|�|kr`|ddkr`|dd	kr`t|d
d��r`td|d�S|�d�}|��j||d�\}}d}|t|�kr�|kr�nnt|�r�td|d|�S|t|�kr�d�||d��nd}	|d
kr�d�|d|d
��nd}
|�r||d
nd}t|
||	|�S)Nu。ru．u｡r#rr$r%r&rr)r5�)	�replacer'rr�split�_get_tld_extractor�suffix_indexrr )rrDr5Znetloc_with_ascii_dotsr(�labelsrJrZnum_ipv4_labelsrrrrrrrCs(
""zTLDExtract._extract_netloc)�	fetch_nowrcCs d|_|j��|r|��dS)z/Force fetch the latest suffix list definitions.N)r<r?�clearrI)rrLrrr�update<s
zTLDExtract.updatez	list[str])rcCst|�����S)z�
        Returns the list of tld's used by default.

        This will vary based on `include_psl_private_domains` and `extra_suffixes`
        )�listrI�tlds)rrrrrPCszTLDExtract.tlds�_PublicSuffixListTLDExtractorcCs`|jr|jSt|j|j|j|jd�\}}t|||jg�s@td��t	||t
|j�|jd�|_|jS)a1Get or compute this object's TLDExtractor.

        Looks up the TLDExtractor in roughly the following order, based on the
        settings passed to __init__:

        1. Memoized on `self`
        2. Local system _cache file
        3. Remote PSL, over HTTP
        4. Bundled PSL snapshot file
        )�cache�urlsr7r4z)No tlds set. Cannot proceed without tlds.)�public_tlds�private_tlds�
extra_tldsr5)r<r
r?r3r7r4�anyr6r;rQrOr5)rrTrUrrrrILszTLDExtract._get_tld_extractor)N)N)N)F)r*r+r,r-r	�PUBLIC_SUFFIX_LIST_URLS�
CACHE_TIMEOUTr@rBrArErCrNr/rPrIrrrrr0�sA#	r0c@sReZdZdZdddddd�dd	�Zedd
ddd�d
d��Zddddd�dd�ZdS)�Triez:Trie for storing eTLDs with their labels in reverse-order.NFzdict | Nonerr1)�matches�endrrcCs|r|ni|_||_||_dS)N)r[r\r)rr[r\rrrrr@ssz
Trie.__init__zCollection[str]zCollection[str] | None)�public_suffixes�private_suffixesrcCsHt�}x|D]}|�|�qW|dkr*g}x|D]}|�|d�q0W|S)z?Create a Trie from a list of suffixes and return its root node.NT)rZ�
add_suffix)r]r^Z	root_noderrrr�createzs

zTrie.creater)rrrcCsT|}|�d�}|��x,|D]$}||jkr6t�|j|<|j|}qWd|_||_dS)z+Append a suffix's labels to this Trie node.rTN)rH�reverser[rZr\r)rrr�noderK�labelrrrr_�s


zTrie.add_suffix)NFF)N)F)r*r+r,r-r@�staticmethodr`r_rrrrrZps
rZFrzbool | None)r9r5rcCst||d�S)N)r5)�
TLD_EXTRACTOR)r9r5rrr�extract�srfcOstj||�S)N)rerN)�args�kwargsrrrrN�srNc@sLeZdZdZdddddd�dd�Zdd	d
d�dd
�Zddd	dd�dd�ZdS)rQz8Wrapper around this project's main algo for PSL lookups.Fz	list[str]r)rTrUrVr5cCsX||_||_||_t|||�|_t||�|_t�|jt|��|_t�|j�|_	dS)N)
r5rTrU�	frozenset�tlds_incl_private�tlds_excl_privaterZr`�tlds_incl_private_trie�tlds_excl_private_trie)rrTrUrVr5rrrr@�sz&_PublicSuffixListTLDExtractor.__init__Nzbool | Nonezfrozenset[str])r5rcCs|dkr|j}|r|jS|jS)z,Get the currently filtered list of suffixes.N)r5rjrk)rr5rrrrP�s
z"_PublicSuffixListTLDExtractor.tldsztuple[int, bool])�splr5rc
Cs�|dkr|j}|r|jn|j}t|�}|}x�t|�D]z}t|�}||jkrh|d8}|j|}|jr4|}q4d|jk}|r�d||jk}	|	r�||jdjfS|d|jdjfSPq4W||jfS)z�Return the index of the first suffix label, and whether it is private.

        Returns len(spl) if no suffix is found.
        Nr�*�!)	r5rlrmr'�reversed�_decode_punycoder[r\r)
rrnr5rbr�jrcZ
decoded_labelZis_wildcardZis_wildcard_exceptionrrrrJ�s,



z*_PublicSuffixListTLDExtractor.suffix_index)F)N)N)r*r+r,r-r@rPrJrrrrrQ�s
rQ)rcrc	Cs>|��}|�d�}|r:y
t�|�Sttfk
r8YnX|S)Nzxn--)�lower�
startswith�idna�decode�UnicodeError�
IndexError)rcZloweredZlooks_like_punyrrrrr�s

rr)F)'r-�
__future__r�logging�os�urllib.parse�urllib�collections.abcrr�	functoolsr�typingrrvrRrr	�remoter
rrZsuffix_listr
�	getLoggerZLOG�environ�getrYrXrr0rerZrBrfrNrQrrrrrr�<module>2s0
TJ-F