Source code for valjean.eponine.browser

# Copyright French Alternative Energies and Atomic Energy Commission
# Contributors: valjean developers
# valjean-support@cea.fr
#
# This software is a computer program whose purpose is to analyze and
# post-process numerical simulation results.
#
# This software is governed by the CeCILL license under French law and abiding
# by the rules of distribution of free software. You can use, modify and/ or
# redistribute the software under the terms of the CeCILL license as circulated
# by CEA, CNRS and INRIA at the following URL: http://www.cecill.info.
#
# As a counterpart to the access to the source code and rights to copy, modify
# and redistribute granted by the license, users are provided only with a
# limited warranty and the software's author, the holder of the economic
# rights, and the successive licensors have only limited liability.
#
# In this respect, the user's attention is drawn to the risks associated with
# loading, using, modifying and/or developing or reproducing the software by
# the user in light of its specific status of free software, that may mean that
# it is complicated to manipulate, and that also therefore means that it is
# reserved for developers and experienced professionals having in-depth
# computer knowledge. Users are therefore encouraged to load and test the
# software's suitability as regards their requirements in conditions enabling
# the security of their systems and/or data to be ensured and, more generally,
# to use and operate it in the same conditions as regards security.
#
# The fact that you are presently reading this means that you have had
# knowledge of the CeCILL license and that you accept its terms.

'''Module to access in easy way results stored in list of dictionaries.

This module is composed of 2 classes:

  * :class:`Browser` that stores the list of dictionaries and builds an
    :class:`Index` to facilitate selections;
  * :class:`Index` based on :class:`collections.defaultdict` to perform
    selections on the list of dictionaries


The classes :class:`Index` and :class:`Browser` are meant to be general even if
they will be shown and used in a specific case: parsing results from Tripoli-4.


The :class:`Index` class
------------------------

This class is based on an inheritance from :class:`collections.abc.Mapping`
from :mod:`collections`. It implements a ``defaultdict(defaultdict(set))`` from
:class:`collections.defaultdict`.

:class:`set` contains `int` that corresponds to the index of the dictionary in
the list of dictionaries.

:class:`Index` is not supposed to be used standalone, but called from
:class:`Browser`, but this is still possible.


The :class:`Browser` class
--------------------------

This class is analogue to a phonebook: it contains an index and the content,
here stored as a list of dictionaries. It commands the index (building and
selections). Examples are shown below.


.. _browser-example:

Building the browser
^^^^^^^^^^^^^^^^^^^^

Let's consider a bunch of friends going to the restaurant and ordering their
menus. For each of them the waiter has to remember their name, under
``'consumer'``, their choice of menu under ``'menu'``, their drink, what they
precisely order as dish under ``'results'`` and optionally the number
corresponding to their choice of dessert. He will represent these orders as a
list of orders, one order being a dictionary.

>>> from valjean.eponine.browser import Browser
>>> from pprint import pprint
>>> orders = [
... {'menu': '1', 'consumer': 'Terry', 'drink': 'beer',
...  'results': {'ingredients_res': ['egg', 'bacon']}},
... {'menu': '2', 'consumer': 'John',
...  'results': [{'ingredients_res': ['egg', 'spam']},
...              {'ingredients_res': ['tomato', 'spam', 'bacon']}]},
... {'menu': '1', 'consumer': 'Graham', 'drink': 'coffee',
...  'results': [{'ingredients_res': ['spam', 'egg', 'spam']}]},
... {'menu': '3', 'consumer': 'Eric', 'drink': 'beer',
...  'results': {'ingredients_res': ['sausage'],
...              'side_res': 'baked beans'}},
... {'menu': 'royal', 'consumer': 'Michael', 'drink': 'brandy', 'dessert': 3,
...  'results': {'dish_res': ['lobster thermidor', 'Mornay sauce']}}]
>>> com_br = Browser(orders)
>>> print(com_br)
Browser object -> Number of content items: 5, data key: 'results', \
available metadata keys: ['consumer', 'dessert', 'drink', 'index', 'menu']
               -> Number of globals: 0

Some global variables can be added, as a dictionary, normally common to all the
results sent under the argument ``content``.

>>> global_vars = {'table': 42, 'service_time': 300, 'priority': -1}
>>> com_br = Browser(orders, global_vars=global_vars)
>>> print(com_br)
Browser object -> Number of content items: 5, data key: 'results', \
available metadata keys: ['consumer', 'dessert', 'drink', 'index', 'menu']
               -> Number of globals: 3
>>> pprint(com_br.globals)
{'priority': -1, 'service_time': 300, 'table': 42}


Selection of a given items or of a list of items from content
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Various methods are available to select one order, depending on requirements:

  * get a new Browser:

    >>> sel_br = com_br.filter_by(menu='1', drink='beer')
    >>> pprint(sel_br.content)  # doctest: +NORMALIZE_WHITESPACE
    [{'consumer': 'Terry',  'drink': 'beer', 'index': 0, 'menu': '1', \
'results': {'ingredients_res': ['egg', 'bacon']}}]

    * check if a key is present or not:

    >>> 'drink' in sel_br
    True
    >>> 'dessert' in sel_br
    False
    >>> 'dessert' in com_br
    True

    The ``'dessert'`` key has been removed from the Browser issued from
    the selection while it is still present in the original one.

  * get the available keys (sorted to be able to test them in the doctest, else
    list is enough):

    >>> sorted(sel_br.keys())
    ['consumer', 'drink', 'index', 'menu']
    >>> sorted(com_br.keys())
    ['consumer', 'dessert', 'drink', 'index', 'menu']

  * if the required key doesn't exist a warning is emitted:

    >>> sel_br = com_br.filter_by(quantity=5)
    >>> # prints  WARNING     browser: quantity not a valid key. Possible \
ones are ['consumer', 'dessert', 'drink', 'index', 'menu']
    >>> 'quantity' in com_br
    False

  * if the value corresponding to the key doesn't exist another warning is
    emitted:

    >>> sel_br = com_br.filter_by(drink='wine')
    >>> # prints  WARNING     browser: wine is not a valid drink

  * to know the available values corresponding to the keys (without the
    corresponding items indexes):

    >>> sorted(com_br.available_values('drink'))
    ['beer', 'brandy', 'coffee']

  * if the key doesn't exist an 'empty generator' is emitted:

    >>> sorted(com_br.available_values('quantity'))
    []

  * to directly get the content items corresponding to the selection, use the
    method :func:`Browser.select_by`

    >>> sel_br = com_br.select_by(consumer='Graham')
    >>> type(sel_br)
    <class 'dict'>
    >>> len(sel_br)
    5
    >>> pprint(sel_br)  # doctest: +NORMALIZE_WHITESPACE
    {'consumer': 'Graham', 'drink': 'coffee', 'index': 2, 'menu': '1', \
'results': [{'ingredients_res': ['spam', 'egg', 'spam']}]}

  * this does not work when several items correspond to the selection:

    >>> sel_br = com_br.select_by(drink='beer')
    Traceback (most recent call last):
            [...]
    valjean.eponine.browser.TooManyItemsBrowserError: Several content items \
correspond to your choice, please refine your selection using additional \
keywords

  * if no item corresponds to the selection another exception is thrown:

    >>> sel_br = com_br.select_by(menu='4')
    Traceback (most recent call last):
            [...]
    valjean.eponine.browser.NoItemBrowserError: No item corresponding to the \
selection.

Module API
----------
'''

import logging
from collections import defaultdict
from collections.abc import Mapping, Container


LOGGER = logging.getLogger(__name__)


def _make_defaultdict_set():
    '''The sole purpose of this function is to give a name to the defaultdict
    factory in :meth:`Index.index`. Without a name, the :class:`Index` class
    cannot be serialized by :mod:`pickle`.
    '''
    return defaultdict(set)


[docs] class Index(Mapping): '''Class to describe index used in Browser. The structure of Index is a ``defaultdict(defaultdict(set))``. This class was derived mainly for pretty-printing purposes. Quick example of index (menu for 4 persons, identified by numbers, one has no drink): >>> from valjean.eponine.browser import Index >>> myindex = Index() >>> myindex.index['drink']['beer'] = {1, 4} >>> myindex.index['drink']['wine'] = {2} >>> myindex.index['menu']['spam'] = {1, 3} >>> myindex.index['menu']['egg'] = {2} >>> myindex.index['menu']['bacon'] = {4} >>> myindex.dump(sort=True) "{'drink': {'beer': {1, 4}, 'wine': {2}}, \ 'menu': {'bacon': {4}, 'egg': {2}, 'spam': {1, 3}}}" >>> 'drink' in myindex True >>> 'consumer' in myindex False >>> len(myindex) 2 >>> for k in sorted(myindex): ... print(k, sorted(myindex[k].keys())) drink ['beer', 'wine'] menu ['bacon', 'egg', 'spam'] The :func:`keep_only` method allows to get a sub-Index from a given set of ids (int), removing all keys not involved in the corresponding ids: >>> myindex.keep_only({2}).dump(sort=True) "{'drink': {'wine': {2}}, 'menu': {'egg': {2}}}" >>> menu_clients14 = myindex.keep_only({1, 4}) >>> sorted(menu_clients14.keys()) == ['drink', 'menu'] True >>> list(menu_clients14['drink'].keys()) == ['beer'] True >>> list(menu_clients14['drink'].values()) == [{1, 4}] True >>> sorted(menu_clients14['menu'].keys()) == ['bacon', 'spam'] True >>> menu_clients14['menu']['spam'] == {1} True >>> 3 in menu_clients14['menu']['spam'] False >>> menu_client3 = myindex.keep_only({3}) >>> list(menu_client3.keys()) == ['menu'] True >>> 'drink' in menu_client3 False The key ``'drink'`` has been removed from the last index as 2 did not required it. If you print an :class:`Index`, it looks like a standard dictionary (``{ ... }`` instead of ``defaultdict(...)``) but the keys are not sorted: >>> print(myindex) {...: {...: {...}...}} '''
[docs] def __init__(self): self.index = defaultdict(_make_defaultdict_set)
[docs] def __str__(self): lstr = ["{"] for i, (key, dset) in enumerate(list(self.index.items())): lstr.append(f'{key!r}: {{') for j, (dkey, ind) in enumerate(list(dset.items())): lstr.append(f'{dkey!r}: {ind!r}') if j < len(dset) - 1: lstr.append(', ') lstr.append('}') if i < len(self.index) - 1: lstr.append(', ') lstr.append('}') return ''.join(lstr)
[docs] def __repr__(self): return self.index.__repr__()
[docs] def __getitem__(self, item): return self.index.__getitem__(item)
[docs] def __len__(self): return len(self.index)
[docs] def __iter__(self): return iter(self.index)
[docs] def __contains__(self, key): return self.index.__contains__(key)
[docs] def keep_only(self, ids): '''Get an :class:`Index` containing only the relevant keywords for the required set of ids. :param set(int) ids: index corresponding to the required elements of the list of content items :returns: Index only containing the keys involved in the ids ''' assert isinstance(ids, set) lind = Index() if not ids: return lind for key in self.index: for kwd, kset in self.index[key].items(): tmpset = kset & ids if tmpset: lind[key][kwd] = tmpset return lind
[docs] def dump(self, *, sort=False): '''Dump the Index. If ``sort == False`` (default case), returns :func:`__str__` result, else returns sorted Index (alphabetic order for keys). ''' if sort: lstr = ["{"] for i, (key, dset) in enumerate(sorted(self.index.items())): lstr.append(f'{key!r}: {{') for j, (dkey, ind) in enumerate(sorted(dset.items(), key=str)): lstr.append(f'{dkey!r}: {ind!r}') if j < len(dset) - 1: lstr.append(', ') lstr.append('}') if i < len(self.index) - 1: lstr.append(', ') lstr.append('}') return ''.join(lstr) return str(self)
[docs] class Browser(Container): '''Class to perform selections on results. This class is based on four objects: * the content, as a list of dictionaries (containing data and metadata) * the key corresponding to data in the dictionary (default='results') * an index based on content elements allowing easy selections on each metadata * a dictionary corresponding to global variables (common to all content items). Initialization parameters: :param list(dict) content: list of items containing data and metadata :param str data_key: key in the content items corresponding to results or data, that should not be used in index (as always present and mandatory) :param dict global_vars: global variables (optional, default=None) An additional key is added at the Index construction: ``'index'`` in order to keep track of the order of the list and being able to do selection on it. Examples on development / debugging methods: Let's use the example detailled above in :ref:`module introduction <browser-example>`: >>> from valjean.eponine.browser import Browser >>> orders = [ ... {'menu': '1', 'consumer': 'Terry', 'drink': 'beer', ... 'results': {'ingredients_res': ['egg', 'bacon']}}, ... {'menu': '2', 'consumer': 'John', ... 'results': [{'ingredients_res': ['egg', 'spam']}, ... {'ingredients_res': ['tomato', 'spam', 'bacon']}]}, ... {'menu': '1', 'consumer': 'Graham', 'drink': 'coffee', ... 'results': [{'ingredients_res': ['spam', 'egg', 'spam']}]}, ... {'menu': '3', 'consumer': 'Eric', 'drink': 'beer', ... 'results': {'ingredients_res': ['sausage'], ... 'side_res': 'baked beans'}}, ... {'menu': 'royal', 'consumer': 'Michael', 'drink': 'brandy', ... 'dessert': 3, ... 'results': {'dish_res': ['lobster thermidor', 'Mornay sauce']}}] >>> com_br = Browser(orders) * possibility to get the item id directly (internally used method): >>> ind = com_br._filter_items_id_by(drink='coffee') >>> isinstance(ind, set) True >>> print(ind) {2} * possibility to get the index of the content element stripped without rebuilding the full Browser: >>> ind = com_br._filter_index_by(menu='1') >>> isinstance(ind, Index) True >>> ind.dump(sort=True) # doctest: +NORMALIZE_WHITESPACE "{'consumer': {'Graham': {2}, 'Terry': {0}}, \ 'drink': {'beer': {0}, 'coffee': {2}}, 'index': {0: {0}, 2: {2}}, \ 'menu': {'1': {0, 2}}}" The 'dessert' key has been stripped from the index: >>> 'dessert' in ind False Debug print is available thanks to :func:`__repr__`: >>> small_order = [{'dessert': 1, 'drink': 'beer', 'results': ['spam']}] >>> so_br = Browser(small_order) >>> f"{so_br!r}" "<class 'valjean.eponine.browser.Browser'>, (Content items: ..., \ Index: ...)" '''
[docs] def __init__(self, content, data_key='results', global_vars=None): self.content = [r.copy() for r in content] self.data_key = data_key self.index = self._build_index() LOGGER.debug("Index: %s", self.index) self.globals = (global_vars.copy() if isinstance(global_vars, dict) else {})
[docs] def __eq__(self, other): return (self.content == other.content and self.data_key == other.data_key and self.globals == other.globals)
[docs] def __ne__(self, other): return not self == other
def _build_index(self): '''Build index from all content elements in the list. Keys of the sets are keywords used to describe the items and/or the scores (if flat case). :param str data_key: key in list of content items corresponding to results or data :returns: :class:`Index` ''' index = Index() for ielt, elt in enumerate(self.content): elt['index'] = ielt for key in elt: if key != self.data_key: index[key][elt[key]].add(ielt) return index
[docs] def __contains__(self, key): if key in self.index: return True return False
[docs] def __len__(self): return len(self.content)
[docs] def is_empty(self): '''Check if the Browser is empty or not. Empty meaning no elements in content AND no globals. ''' return not self.content and not self.globals
[docs] def merge(self, other): '''Merge two browsers. This method merge 2 browsers: the *other* one appears then at the end of the *self* one. Global variables are also merged. The new index correspond to the merged case. :param Browser other: another browser :rtype: Browser ''' if self.data_key != other.data_key: raise ValueError('Same data_key is required to merge Browsers') if self.globals != other.globals: LOGGER.debug('globals will be updated with other values') new_glob = self.globals.copy() new_glob.update(other.globals) new_content = self.content + other.content LOGGER.debug("Nb self items (%d) + Nb other items (%d) = %d", len(self), len(other), len(new_content)) return Browser(new_content, data_key=self.data_key, global_vars=new_glob)
[docs] def keys(self): '''Get the available keys in the index (so in the items list). As usual it returns a generator. ''' return tuple(self.index.keys())
[docs] def available_values(self, key): '''Get the available keys in the *second* level, i.e. under the given external one. :param str key: 'external' key (from outer defaultdict) :returns: generator with corresponding keys (or empty one) ''' if key in self: return tuple(self.index[key].keys()) return ()
[docs] def __str__(self): cls_name = self.__class__.__name__ return (f"{cls_name} object -> " f"Number of content items: {len(self.content)}, " f"data key: {self.data_key!r}, " f"available metadata keys: {sorted(self.keys())}\n" f"{'':>{len(cls_name)}} " f"-> Number of globals: {len(self.globals)}")
[docs] def __repr__(self): return (f"{self.__class__}, (Content items: {self.content!r}, " f"Index: {self.index!r})")
def _filter_items_id_by(self, **kwargs): '''Selection of content items indices according to kwargs criteria. :param \\**\\kwargs: keyword arguments to specify the required response. More than one are allowed. :return: set of ids :rtype: set(int) ''' itemids = set(range(len(self.content))) for kwd, kwarg in kwargs.items(): if kwd not in self.index: LOGGER.warning("%s not a valid key. Possible ones are %s", kwd, sorted(self.keys())) return set() if kwarg not in self.index[kwd]: LOGGER.warning("%s is not a valid %s", kwarg, kwd) return set() itemids = itemids & self.index[kwd][kwarg] if not itemids: LOGGER.warning("Wrong selection, item might be not present. " "Also check if requirements are consistent.") return set() return itemids def _filter_index_by(self, **kwargs): '''Get index corresponding to selection given thanks to keyword arguments. :param \\**\\kwargs: keyword arguments to specify the required item. More than one are allowed. :returns: :class:`Index` (stripped from useless keys) ''' itemids = self._filter_items_id_by(**kwargs) return self.index.keep_only(itemids)
[docs] def filter_by(self, include=(), exclude=(), **kwargs): '''Get a Browser corresponding to selection from keyword arguments. :param \\**\\kwargs: keyword arguments to specify the required item. More than one are allowed. :param tuple(str) include: metadata keys required in the content items but for which the value is not necessarly known :param tuple(str) exclude: metadata that should not be present in the items and for which the value is not necessarly known :returns: :class:`Browser` (subset of the default one, corresponding to the selection) ''' LOGGER.debug("in select_by, kwargs=%s", kwargs) sincl, sexcl = set(include), set(exclude) respids = self._filter_items_id_by(**kwargs) lresp = [self.content[i] for i in sorted(respids) if sincl.issubset(self.content[i]) and not sexcl.intersection(self.content[i])] sub_br = Browser(lresp, global_vars=self.globals) return sub_br
[docs] def select_by(self, *, include=(), exclude=(), **kwargs): '''Get an item or the list of items from content corresponding to selection from keyword arguments. :param \\**\\kwargs: keyword arguments to specify the required items. More than one are allowed. :param tuple(str) include: metadata keys required in the items but for which the value is not necessarly known :param tuple(str) exclude: metadata that should not be present in the items and for which the value is not necessarly known :raises NoItemBrowserError: if no item corresponds to the selection :raises TooManyItemsBrowserError: if more than one item corresponds to the provided keywords :rtype: dict ''' respids = self._filter_items_id_by(**kwargs) sincl, sexcl = set(include), set(exclude) litems = [self.content[i] for i in sorted(respids) if sincl.issubset(self.content[i]) and not sexcl.intersection(self.content[i])] if not litems: raise NoItemBrowserError("No item corresponding to the selection.") if len(litems) > 1: raise TooManyItemsBrowserError( "Several content items correspond to your choice, " "please refine your selection using additional keywords") return litems[0]
[docs] class TooManyItemsBrowserError(LookupError): '''Error to :class:`Browser` when too many items correspond to the requested selection. '''
[docs] class NoItemBrowserError(LookupError): '''Error to :class:`Browser` when no item corresponds to the selection.'''