Metadata-Version: 2.1
Name: haproxy-log-analysis
Version: 5.1.0
Summary: Haproxy log analyzer that tries to gives an insight of what's going on
Home-page: https://github.com/gforcada/haproxy_log_analysis
Author: Gil Forcada
Author-email: gil.gnome@gmail.com
License: GPL v3
Keywords: haproxy log analysis report
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Internet :: Log Analysis
Requires-Python: >=3.7
License-File: LICENSE


.. -*- coding: utf-8 -*-

HAProxy log analyzer
====================
This Python package is a `HAProxy`_ log parser.
It analyzes HAProxy log files in multiple ways (see commands section below).

.. note::
   Currently only the `HTTP log format`_ is supported.

Tests and coverage
------------------
No project is trustworthy if does not have tests and a decent coverage!

.. image:: https://github.com/gforcada/haproxy_log_analysis/actions/workflows/tests.yml/badge.svg?branch=master
   :target: https://github.com/gforcada/haproxy_log_analysis/actions/workflows/tests.yml

.. image:: https://coveralls.io/repos/github/gforcada/haproxy_log_analysis/badge.svg?branch=master
   :target: https://coveralls.io/github/gforcada/haproxy_log_analysis?branch=master


Documentation
-------------
See the `documentation and API`_ at ReadTheDocs_.

Command-line interface
----------------------
The current ``--help`` looks like this::

  usage: haproxy_log_analysis [-h] [-l LOG] [-s START] [-d DELTA] [-c COMMAND]
                              [-f FILTER] [-n] [--list-commands]
                              [--list-filters] [--json]

  Analyze HAProxy log files and outputs statistics about it

  optional arguments:
    -h, --help            show this help message and exit
    -l LOG, --log LOG     HAProxy log file to analyze
    -s START, --start START
                          Process log entries starting at this time, in HAProxy
                          date format (e.g. 11/Dec/2013 or
                          11/Dec/2013:19:31:41). At least provide the
                          day/month/year. Values not specified will use their
                          base value (e.g. 00 for hour). Use in conjunction with
                          -d to limit the number of entries to process.
    -d DELTA, --delta DELTA
                          Limit the number of entries to process. Express the
                          time delta as a number and a time unit, e.g.: 1s, 10m,
                          3h or 4d (for 1 second, 10 minutes, 3 hours or 4
                          days). Use in conjunction with -s to only analyze
                          certain time delta. If no start time is given, the
                          time on the first line will be used instead.
    -c COMMAND, --command COMMAND
                          List of commands, comma separated, to run on the log
                          file. See --list-commands to get a full list of them.
    -f FILTER, --filter FILTER
                          List of filters to apply on the log file. Passed as
                          comma separated and parameters within square brackets,
                          e.g ip[192.168.1.1],ssl,path[/some/path]. See --list-
                          filters to get a full list of them.
    -n, --negate-filter   Make filters passed with -f work the other way around,
                          i.e. if the ``ssl`` filter is passed instead of
                          showing only ssl requests it will show non-ssl
                          traffic. If the ``ip`` filter is used, then all but
                          that ip passed to the filter will be used.
    --list-commands       Lists all commands available.
    --list-filters        Lists all filters available.
    --json                Output results in json.
    --invalid             Print the lines that could not be parsed. Be aware
                          that mixing it with the print command will mix their
                          output.


Commands
--------

Commands are small purpose specific programs in themselves that report specific statistics about the log file being analyzed.
See them all with ``--list-commands`` or online at https://haproxy-log-analyzer.readthedocs.io/modules.html#module-haproxy.commands.

- ``average_response_time``
- ``average_waiting_time``
- ``connection_type``
- ``counter``
- ``http_methods``
- ``ip_counter``
- ``print``
- ``queue_peaks``
- ``request_path_counter``
- ``requests_per_hour``
- ``requests_per_minute``
- ``server_load``
- ``slow_requests``
- ``slow_requests_counter``
- ``status_codes_counter``
- ``top_ips``
- ``top_request_paths``

Filters
-------
Filters, contrary to commands,
are a way to reduce the amount of log lines to be processed.

.. note::
   The ``-n`` command line argument allows to reverse filters output.

   This helps when looking for specific traces, like a certain IP, a path...

See them all with ``--list-filters`` or online at https://haproxy-log-analyzer.readthedocs.io/modules.html#module-haproxy.filters.

- ``backend``
- ``frontend``
- ``http_method``
- ``ip``
- ``ip_range``
- ``path``
- ``response_size``
- ``server``
- ``slow_requests``
- ``ssl``
- ``status_code``
- ``status_code_family``
- ``wait_on_queues``

Installation
------------
After installation you will have a console script `haproxy_log_analysis`::

    $ pip install haproxy_log_analysis

TODO
----
- add more commands: *(help appreciated)*

  - reports on servers connection time
  - reports on termination state
  - reports around connections (active, frontend, backend, server)
  - *your ideas here*

- think of a way to show the commands output in a meaningful way

- be able to specify an output format. For any command that makes sense (slow
  requests for example) output the given fields for each log line (i.e.
  acceptance date, path, downstream server, load at that time...)

- *your ideas*

.. _HAProxy: http://haproxy.1wt.eu/
.. _HTTP log format: http://cbonte.github.io/haproxy-dconv/2.2/configuration.html#8.2.3
.. _documentation and API: https://haproxy-log-analyzer.readthedocs.io/
.. _ReadTheDocs: http://readthedocs.org

CHANGES
=======

5.1.0 (2022-12-03)
------------------

- Only get the first IP from `X-Forwarded-For` header.
  [gforcada]

- Improve tests robustness.
  [gforcada]

- Fix `top_ips` and `top_request_paths` commands output.
  They were showing all output, rather than only the top 10.
  [gforcada]

- Move `tests` folder to the top-level.
  [gforcada]

5.0.0 (2022-11-27)
------------------

- Drop testing on travis-ci.
  [gforcada]

- Use GitHub Actions.
  [gforcada]

- Format the code with `pyupgrade`, `black` and `isort`.
  [gforcada]

- Use `pip-tools` to keep dependencies locked.
  [gforcada]

- Bump python versions supported to 3.7-3.11 and pypy.
  [gforcada]

- Drop python 3.6 (EOL).
  [gforcada]

4.1.0 (2020-01-06)
------------------

- **New command:** ``requests_per_hour``.
  Just like the ``requests_per_minute`` but with hour granularity.
  Idea and first implementation done by ``valleedelisle``.
  [gforcada]

- Fix parsing truncated requests.
  Idea and first implementation by ``vixns``.
  [gforcada]

4.0.0 (2020-01-06)
------------------

**BREAKING CHANGES:**

- Complete rewrite to use almost no memory usage even on huge files.
  [gforcada]

- Add parallelization to make parsing faster by parsing multiple lines in parallel.
  [gforcada]

- Rename command ``counter_slow_requests`` to ``slow_requests_counter``,
  so it is aligned with all other ``_counter`` commands.
  [gforcada]

- Changed the ``counter_invalid`` command to a new command line switch ``--invalid``.
  [gforcada]

**Regular changes:**

- Drop Python 2 support, and test on Python 3.8.
  [gforcada]

- Remove the pickling support.
  [gforcada]

- Add `--json` output command line option.
  [valleedelisle]

3.0.0 (2019-06-10)
------------------

- Fix spelling.
  [EdwardBetts]

- Make ip_counter use client_ip per default.
  [vixns]

- Overhaul testing environment. Test on python 3.7 as well. Use black to format.
  [gforcada]

2.1 (2017-07-06)
----------------
- Enforce QA checks (flake8) on code.
  All code has been updated to follow it.
  [gforcada]

- Support Python 3.6.
  [gforcada]

- Support different syslog timestamps (at least NixOS).
  [gforcada]

2.0.2 (2016-11-17)
------------------

- Improve performance for ``cmd_print``.
  [kevinjqiu]

2.0.1 (2016-10-29)
------------------

- Allow hostnames to have a dot in it.
  [gforcada]

2.0 (2016-07-06)
----------------
- Handle unparsable HTTP requests.
  [gforcada]

- Only test on python 2.7 and 3.5
  [gforcada]

2.0b0 (2016-04-18)
------------------
- Check the divisor before doing a division to not get ``ZeroDivisionError`` exceptions.
  [gforcada]

2.0a0 (2016-03-29)
------------------
- Major refactoring:

  # Rename modules and classes:

    - haproxy_logline -> line
    - haproxy_logfile -> logfile
    - HaproxyLogLine -> Line
    - HaproxyLogFile -> Log

  # Parse the log file on Log() creation (i.e. in its __init__)

  [gforcada]

1.3 (2016-03-29)
----------------

- New filter: ``filter_wait_on_queues``.
  Get all requests that waited at maximum X amount of milliseconds on HAProxy queues.
  [gforcada]

- Code/docs cleanups and add code analysis.
  [gforcada]

- Avoid using eval.
  [gforcada]

1.2.1 (2016-02-23)
------------------

- Support -1 as a status_code
  [Christopher Baines]

1.2 (2015-12-07)
----------------

- Allow a hostname on the syslog part (not only IPs)
  [danny crasto]

1.1 (2015-04-19)
----------------

- Make syslog optional.
  Fixes issue https://github.com/gforcada/haproxy_log_analysis/issues/10.
  [gforcada]

1.0 (2015-03-24)
----------------

- Fix issue #9.
  log line on the syslog part was too strict,
  it was expecting the hostname to be a string and was
  failing if it was an IP.
  [gforcada]

0.0.3.post2 (2015-01-05)
------------------------

- Finally really fixed issue #7.
  ``namespace_packages`` was not meant to be on setup.py at all.
  Silly copy&paste mistake.
  [gforcada]

0.0.3.post (2015-01-04)
-----------------------

- Fix release on PyPI.
  Solves GitHub issue #7.
  https://github.com/gforcada/haproxy_log_analysis/issues/7
  [gforcada]

0.0.3 (2014-07-09)
------------------

- Fix release on PyPI (again).
  [gforcada]

0.0.2 (2014-07-09)
------------------

- Fix release on PyPI.
  [gforcada]

0.0.1 (2014-07-09)
------------------

- Pickle :class::`.HaproxyLogFile` data for faster performance.
  [gforcada]

- Add a way to negate the filters, so that instead of being able to filter by
  IP, it can output all but that IP information.
  [gforcada]

- Add lots of filters: ip, path, ssl, backend, frontend, server, status_code
  and so on. See ``--list-filters`` for a complete list of them.
  [gforcada]

- Add :method::`.HaproxyLogFile.parse_data` method to get data from data stream.
  It allows you use it as a library.
  [bogdangi]

- Add ``--list-filters`` argument on the command line interface.
  [gforcada]

- Add ``--filter`` argument on the command line interface, inspired by
  Bogdan's early design.
  [bogdangi] [gforcada]

- Create a new module :module::`haproxy.filters` that holds all available filters.
  [gforcada]

- Improve :method::`.HaproxyLogFile.cmd_queue_peaks` output to not only show
  peaks but also when requests started to queue and when they finished and
  the amount of requests that had been queued.
  [gforcada]

- Show help when no argument is given.
  [gforcada]

- Polish documentation and docstrings here and there.
  [gforcada]

- Add a ``--list-commands`` argument on the command line interface.
  [gforcada]

- Generate an API doc for ``HaproxyLogLine`` and ``HaproxyLogFile``.
  [bogdangi]

- Create a ``console_script`` `haproxy_log_analysis` for ease of use.
  [bogdangi]

- Add Sphinx documentation system, still empty.
  [gforcada]

- Keep valid log lines sorted so that the exact order of connections is kept.
  [gforcada]

- Add quite a few commands, see `README.rst`_ for a complete list of them.
  [gforcada]

- Run commands passed as arguments (with -c flag).
  [gforcada]

- Add a requirements.txt file to keep track of dependencies and pin them.
  [gforcada]

- Add travis_ and coveralls_ support. See its badges on `README.rst`_.
  [gforcada]

- Add argument parsing and custom validation logic for all arguments.
  [gforcada]

- Add regular expressions for haproxy log lines (HTTP format) and to
  parse HTTP requests path.
  Added tests to ensure they work as expected.
  [gforcada]

- Create distribution.
  [gforcada]

.. _travis: https://travis-ci.org/
.. _coveralls: https://coveralls.io/
.. _README.rst: http://github.com/gforcada/haproxy_log_analysis

