.. _usage: Getting Started =============== Installation ------------ **From PyPi**:: $ pip install pytubes **From source**:: $ pip install -r build_requirements.txt $ python setup.py install Usage ----- Usage is very simple: #. Import ``tubes`` #. create an input tube (currently either: :class:`tube.Each` or :class:`tube.Count`) to get some data into the tube #. call methods on the input tube to build up each step of the processing (e.g. ``read_files().split().json()``...) #. Iterate over the tube to generate the data, by either: - Calling ``list(tube)`` - looping over it in a for-loop: ``for item in tube:`` - or: Calling ``x = iter(tube)``, and then ``next(x)`` repeatedly. Some Examples ~~~~~~~~~~~~~ >>> from tubes import Each, Count >>> list(Count().first(5)) [0, 1, 2, 3, 4] >>> from urllib.request import urlopen >>> response = urlopen("https://dumps.wikimedia.org/other/pageviews/2019/2019-07/pageviews-20190716-140000.gz") >>> dict(Each([response]).read_fileobj().gunzip(stream=True) # Stream the response and gunzip it .tsv(sep=" ", skip_empty_rows=True) # Parse as a TSV file (with spaces not tabs) .skip_unless(lambda x: x.get(0).to(bytes).equals(b"en")) # EN wikipedia only .skip_unless(lambda x: x.get(2).to(int).gt(10_000)) # Only include pages with viewcount > 10,000 .first(3) # Get the first 5 only .multi(lambda x: ( # Extract Column 1(Page title) and Column 2(Page count) x.get(1).to(str), x.get(2).to(int)) ) ) {'-': 31066, 'Main_Page': 709331, 'Special:Search': 49869}