Before you can work with URLs, you must create URLs.
If you already have a textual URL, the easiest way to get URL objects
is with the
parse(url, decoded=True, lazy=False)[source]
Automatically turn text into a structured URL object.
>>> url = parse(u"https://github.com/python-hyper/hyperlink")
- url – A text string representation of a URL.
- decoded – Whether or not to return a
which automatically handles all
encoding/decoding/quoting/unquoting for all the various
accessors of parts of the URL, or a
which has the same API, but requires handling of special
characters for different parts of the URL.
- lazy – In the case of decoded=True, this controls
whether the URL is decoded immediately or as accessed. The
default, lazy=False, checks all encoded parts of the URL
parse() returns an instance of
DecodedURL, a URL type that handles all encoding for you, by
wrapping the lower-level
URL looks very similar to the
DecodedURL, but does not handle all encoding cases for
you. Use with caution.
URL is also available as an alias,
hyperlink.EncodedURL for more explicit usage.
URL(scheme=None, host=None, path=(), query=(), fragment=u'', port=None, rooted=None, userinfo=u'', uses_netloc=None)[source]
From blogs to billboards, URLs are so common, that it’s easy to
overlook their complexity and power. With hyperlink’s
URL type, working with URLs doesn’t have to be hard.
URLs are made of many parts. Most of these parts are officially
named in RFC 3986 and this diagram may prove handy in identifying
\_/ \_______/ \_________/ \__/\_________/ \_________/ \__/
| | | | | | |
scheme userinfo host port path query fragment
from_text() is used for parsing whole URLs, the
URL constructor builds a URL from the individual
components, like so:
>>> from hyperlink import URL
>>> url = URL(scheme=u'https', host=u'example.com', path=[u'hello', u'world'])
The constructor runs basic type checks. All strings are expected
to be text (
str in Python 3,
unicode in Python 2). All
arguments are optional, defaulting to appropriately empty values. A full
list of constructor arguments is below.
- scheme – The text name of the scheme.
- host – The host portion of the network location
- port – The port part of the network location. If
None or no port is
passed, the port will default to the default port of the scheme, if
it is known. See the
register_default_port() for more info.
- path – A tuple of strings representing the slash-separated parts of the
path, each percent-encoded.
- query – The query parameters, as a dictionary or as an sequence of
percent-encoded key-value pairs.
- fragment – The fragment part of the URL.
- rooted – A rooted URL is one which indicates an absolute path.
This is True on any URL that includes a host, or any relative URL
that starts with a slash.
- userinfo – The username or colon-separated username:password pair.
- uses_netloc – Indicates whether
:// (the “netloc separator”) will
appear to separate the scheme from the path in cases where no
host is present.
Setting this to
True is a non-spec-compliant affordance for the
common practice of having URIs that are not URLs (cannot have a
‘host’ part) but nevertheless use the common
:// idiom that
most people associate with URLs; e.g.
message: URIs like
message://message-id being equivalent to
This may be inferred based on the scheme depending on whether
register_scheme() has been used to register the scheme and
should not be passed directly unless you know the scheme works like
this and you know it has not been registered.
All of these parts are also exposed as read-only attributes of
instances, along with several useful methods.
URL constructor is useful for constructing
URLs from parts,
from_text() supports parsing whole
URLs from their string form:
As you can see above, it’s also used as the
URL objects. The natural counterpart to
to_text(). This method only accepts text, so be
sure to decode those bytestrings.
|Parameters:||text – A valid URL string.
|Returns:||The structured object version of the parsed string.
Somewhat unexpectedly, URLs are a far more permissive
format than most would assume. Many strings which don’t
look like URLs are still valid URLs. As a result, this
method only raises
URLParseError on invalid port
and IPv6 values in the host portion of the URL.