# Copyright French Alternative Energies and Atomic Energy Commission
# Contributors: valjean developers
# valjean-support@cea.fr
#
# This software is a computer program whose purpose is to analyze and
# post-process numerical simulation results.
#
# This software is governed by the CeCILL license under French law and abiding
# by the rules of distribution of free software. You can use, modify and/ or
# redistribute the software under the terms of the CeCILL license as circulated
# by CEA, CNRS and INRIA at the following URL: http://www.cecill.info.
#
# As a counterpart to the access to the source code and rights to copy, modify
# and redistribute granted by the license, users are provided only with a
# limited warranty and the software's author, the holder of the economic
# rights, and the successive licensors have only limited liability.
#
# In this respect, the user's attention is drawn to the risks associated with
# loading, using, modifying and/or developing or reproducing the software by
# the user in light of its specific status of free software, that may mean that
# it is complicated to manipulate, and that also therefore means that it is
# reserved for developers and experienced professionals having in-depth
# computer knowledge. Users are therefore encouraged to load and test the
# software's suitability as regards their requirements in conditions enabling
# the security of their systems and/or data to be ensured and, more generally,
# to use and operate it in the same conditions as regards security.
#
# The fact that you are presently reading this means that you have had
# knowledge of the CeCILL license and that you accept its terms.
'''Module to access in easy way results stored in list of dictionaries.
This module is composed of 2 classes:
* :class:`Browser` that stores the list of dictionaries and builds an
:class:`Index` to facilitate selections;
* :class:`Index` based on :class:`collections.defaultdict` to perform
selections on the list of dictionaries
The classes :class:`Index` and :class:`Browser` are meant to be general even if
they will be shown and used in a specific case: parsing results from Tripoli-4.
The :class:`Index` class
------------------------
This class is based on an inheritance from :class:`collections.abc.Mapping`
from :mod:`collections`. It implements a ``defaultdict(defaultdict(set))`` from
:class:`collections.defaultdict`.
:class:`set` contains `int` that corresponds to the index of the dictionary in
the list of dictionaries.
:class:`Index` is not supposed to be used standalone, but called from
:class:`Browser`, but this is still possible.
The :class:`Browser` class
--------------------------
This class is analogue to a phonebook: it contains an index and the content,
here stored as a list of dictionaries. It commands the index (building and
selections). Examples are shown below.
.. _browser-example:
Building the browser
^^^^^^^^^^^^^^^^^^^^
Let's consider a bunch of friends going to the restaurant and ordering their
menus. For each of them the waiter has to remember their name, under
``'consumer'``, their choice of menu under ``'menu'``, their drink, what they
precisely order as dish under ``'results'`` and optionally the number
corresponding to their choice of dessert. He will represent these orders as a
list of orders, one order being a dictionary.
>>> from valjean.eponine.browser import Browser
>>> from pprint import pprint
>>> orders = [
... {'menu': '1', 'consumer': 'Terry', 'drink': 'beer',
... 'results': {'ingredients_res': ['egg', 'bacon']}},
... {'menu': '2', 'consumer': 'John',
... 'results': [{'ingredients_res': ['egg', 'spam']},
... {'ingredients_res': ['tomato', 'spam', 'bacon']}]},
... {'menu': '1', 'consumer': 'Graham', 'drink': 'coffee',
... 'results': [{'ingredients_res': ['spam', 'egg', 'spam']}]},
... {'menu': '3', 'consumer': 'Eric', 'drink': 'beer',
... 'results': {'ingredients_res': ['sausage'],
... 'side_res': 'baked beans'}},
... {'menu': 'royal', 'consumer': 'Michael', 'drink': 'brandy', 'dessert': 3,
... 'results': {'dish_res': ['lobster thermidor', 'Mornay sauce']}}]
>>> com_br = Browser(orders)
>>> print(com_br)
Browser object -> Number of content items: 5, data key: 'results', \
available metadata keys: ['consumer', 'dessert', 'drink', 'index', 'menu']
-> Number of globals: 0
Some global variables can be added, as a dictionary, normally common to all the
results sent under the argument ``content``.
>>> global_vars = {'table': 42, 'service_time': 300, 'priority': -1}
>>> com_br = Browser(orders, global_vars=global_vars)
>>> print(com_br)
Browser object -> Number of content items: 5, data key: 'results', \
available metadata keys: ['consumer', 'dessert', 'drink', 'index', 'menu']
-> Number of globals: 3
>>> pprint(com_br.globals)
{'priority': -1, 'service_time': 300, 'table': 42}
Selection of a given items or of a list of items from content
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Various methods are available to select one order, depending on requirements:
* get a new Browser:
>>> sel_br = com_br.filter_by(menu='1', drink='beer')
>>> pprint(sel_br.content) # doctest: +NORMALIZE_WHITESPACE
[{'consumer': 'Terry', 'drink': 'beer', 'index': 0, 'menu': '1', \
'results': {'ingredients_res': ['egg', 'bacon']}}]
* check if a key is present or not:
>>> 'drink' in sel_br
True
>>> 'dessert' in sel_br
False
>>> 'dessert' in com_br
True
The ``'dessert'`` key has been removed from the Browser issued from
the selection while it is still present in the original one.
* get the available keys (sorted to be able to test them in the doctest, else
list is enough):
>>> sorted(sel_br.keys())
['consumer', 'drink', 'index', 'menu']
>>> sorted(com_br.keys())
['consumer', 'dessert', 'drink', 'index', 'menu']
* if the required key doesn't exist a warning is emitted:
>>> sel_br = com_br.filter_by(quantity=5)
>>> # prints WARNING browser: quantity not a valid key. Possible \
ones are ['consumer', 'dessert', 'drink', 'index', 'menu']
>>> 'quantity' in com_br
False
* if the value corresponding to the key doesn't exist another warning is
emitted:
>>> sel_br = com_br.filter_by(drink='wine')
>>> # prints WARNING browser: wine is not a valid drink
* to know the available values corresponding to the keys (without the
corresponding items indexes):
>>> sorted(com_br.available_values('drink'))
['beer', 'brandy', 'coffee']
* if the key doesn't exist an 'empty generator' is emitted:
>>> sorted(com_br.available_values('quantity'))
[]
* to directly get the content items corresponding to the selection, use the
method :func:`Browser.select_by`
>>> sel_br = com_br.select_by(consumer='Graham')
>>> type(sel_br)
<class 'dict'>
>>> len(sel_br)
5
>>> pprint(sel_br) # doctest: +NORMALIZE_WHITESPACE
{'consumer': 'Graham', 'drink': 'coffee', 'index': 2, 'menu': '1', \
'results': [{'ingredients_res': ['spam', 'egg', 'spam']}]}
* this does not work when several items correspond to the selection:
>>> sel_br = com_br.select_by(drink='beer')
Traceback (most recent call last):
[...]
valjean.eponine.browser.TooManyItemsBrowserError: Several content items \
correspond to your choice, please refine your selection using additional \
keywords
* if no item corresponds to the selection another exception is thrown:
>>> sel_br = com_br.select_by(menu='4')
Traceback (most recent call last):
[...]
valjean.eponine.browser.NoItemBrowserError: No item corresponding to the \
selection.
Module API
----------
'''
import logging
from collections import defaultdict
from collections.abc import Mapping, Container
LOGGER = logging.getLogger(__name__)
def _make_defaultdict_set():
'''The sole purpose of this function is to give a name to the defaultdict
factory in :meth:`Index.index`. Without a name, the :class:`Index` class
cannot be serialized by :mod:`pickle`.
'''
return defaultdict(set)
[docs]class Index(Mapping):
'''Class to describe index used in Browser.
The structure of Index is a ``defaultdict(defaultdict(set))``. This class
was derived mainly for pretty-printing purposes.
Quick example of index (menu for 4 persons, identified by numbers, one has
no drink):
>>> from valjean.eponine.browser import Index
>>> myindex = Index()
>>> myindex.index['drink']['beer'] = {1, 4}
>>> myindex.index['drink']['wine'] = {2}
>>> myindex.index['menu']['spam'] = {1, 3}
>>> myindex.index['menu']['egg'] = {2}
>>> myindex.index['menu']['bacon'] = {4}
>>> myindex.dump(sort=True)
"{'drink': {'beer': {1, 4}, 'wine': {2}}, \
'menu': {'bacon': {4}, 'egg': {2}, 'spam': {1, 3}}}"
>>> 'drink' in myindex
True
>>> 'consumer' in myindex
False
>>> len(myindex)
2
>>> for k in sorted(myindex):
... print(k, sorted(myindex[k].keys()))
drink ['beer', 'wine']
menu ['bacon', 'egg', 'spam']
The :func:`keep_only` method allows to get a sub-Index from a given set of
ids (int), removing all keys not involved in the corresponding ids:
>>> myindex.keep_only({2}).dump(sort=True)
"{'drink': {'wine': {2}}, 'menu': {'egg': {2}}}"
>>> menu_clients14 = myindex.keep_only({1, 4})
>>> sorted(menu_clients14.keys()) == ['drink', 'menu']
True
>>> list(menu_clients14['drink'].keys()) == ['beer']
True
>>> list(menu_clients14['drink'].values()) == [{1, 4}]
True
>>> sorted(menu_clients14['menu'].keys()) == ['bacon', 'spam']
True
>>> menu_clients14['menu']['spam'] == {1}
True
>>> 3 in menu_clients14['menu']['spam']
False
>>> menu_client3 = myindex.keep_only({3})
>>> list(menu_client3.keys()) == ['menu']
True
>>> 'drink' in menu_client3
False
The key ``'drink'`` has been removed from the last index as 2 did not
required it.
If you print an :class:`Index`, it looks like a standard dictionary (``{
... }`` instead of ``defaultdict(...)``) but the keys are not sorted:
>>> print(myindex)
{...: {...: {...}...}}
'''
[docs] def __init__(self):
self.index = defaultdict(_make_defaultdict_set)
[docs] def __str__(self):
lstr = ["{"]
for i, (key, dset) in enumerate(list(self.index.items())):
lstr.append(f'{key!r}: {{')
for j, (dkey, ind) in enumerate(list(dset.items())):
lstr.append(f'{dkey!r}: {ind!r}')
if j < len(dset) - 1:
lstr.append(', ')
lstr.append('}')
if i < len(self.index) - 1:
lstr.append(', ')
lstr.append('}')
return ''.join(lstr)
[docs] def __repr__(self):
return self.index.__repr__()
[docs] def __getitem__(self, item):
return self.index.__getitem__(item)
[docs] def __len__(self):
return len(self.index)
[docs] def __iter__(self):
return iter(self.index)
[docs] def __contains__(self, key):
return self.index.__contains__(key)
[docs] def keep_only(self, ids):
'''Get an :class:`Index` containing only the relevant keywords for the
required set of ids.
:param set(int) ids: index corresponding to the required elements of
the list of content items
:returns: Index only containing the keys involved in the ids
'''
assert isinstance(ids, set)
lind = Index()
if not ids:
return lind
for key in self.index:
for kwd, kset in self.index[key].items():
tmpset = kset & ids
if tmpset:
lind[key][kwd] = tmpset
return lind
[docs] def dump(self, *, sort=False):
'''Dump the Index.
If ``sort == False`` (default case), returns :func:`__str__` result,
else returns sorted Index (alphabetic order for keys).
'''
if sort:
lstr = ["{"]
for i, (key, dset) in enumerate(sorted(self.index.items())):
lstr.append(f'{key!r}: {{')
for j, (dkey, ind) in enumerate(sorted(dset.items(), key=str)):
lstr.append(f'{dkey!r}: {ind!r}')
if j < len(dset) - 1:
lstr.append(', ')
lstr.append('}')
if i < len(self.index) - 1:
lstr.append(', ')
lstr.append('}')
return ''.join(lstr)
return str(self)
[docs]class Browser(Container):
'''Class to perform selections on results.
This class is based on four objects:
* the content, as a list of dictionaries (containing data and metadata)
* the key corresponding to data in the dictionary (default='results')
* an index based on content elements allowing easy selections on each
metadata
* a dictionary corresponding to global variables (common to all content
items).
Initialization parameters:
:param list(dict) content: list of items containing data and metadata
:param str data_key: key in the content items corresponding to results or
data, that should not be used in index (as always present and mandatory)
:param dict global_vars: global variables (optional, default=None)
An additional key is added at the Index construction: ``'index'`` in order
to keep track of the order of the list and being able to do selection on
it.
Examples on development / debugging methods:
Let's use the example detailled above in
:ref:`module introduction <browser-example>`:
>>> from valjean.eponine.browser import Browser
>>> orders = [
... {'menu': '1', 'consumer': 'Terry', 'drink': 'beer',
... 'results': {'ingredients_res': ['egg', 'bacon']}},
... {'menu': '2', 'consumer': 'John',
... 'results': [{'ingredients_res': ['egg', 'spam']},
... {'ingredients_res': ['tomato', 'spam', 'bacon']}]},
... {'menu': '1', 'consumer': 'Graham', 'drink': 'coffee',
... 'results': [{'ingredients_res': ['spam', 'egg', 'spam']}]},
... {'menu': '3', 'consumer': 'Eric', 'drink': 'beer',
... 'results': {'ingredients_res': ['sausage'],
... 'side_res': 'baked beans'}},
... {'menu': 'royal', 'consumer': 'Michael', 'drink': 'brandy',
... 'dessert': 3,
... 'results': {'dish_res': ['lobster thermidor', 'Mornay sauce']}}]
>>> com_br = Browser(orders)
* possibility to get the item id directly (internally used method):
>>> ind = com_br._filter_items_id_by(drink='coffee')
>>> isinstance(ind, set)
True
>>> print(ind)
{2}
* possibility to get the index of the content element stripped without
rebuilding the full Browser:
>>> ind = com_br._filter_index_by(menu='1')
>>> isinstance(ind, Index)
True
>>> ind.dump(sort=True) # doctest: +NORMALIZE_WHITESPACE
"{'consumer': {'Graham': {2}, 'Terry': {0}}, \
'drink': {'beer': {0}, 'coffee': {2}}, 'index': {0: {0}, 2: {2}}, \
'menu': {'1': {0, 2}}}"
The 'dessert' key has been stripped from the index:
>>> 'dessert' in ind
False
Debug print is available thanks to :func:`__repr__`:
>>> small_order = [{'dessert': 1, 'drink': 'beer', 'results': ['spam']}]
>>> so_br = Browser(small_order)
>>> f"{so_br!r}"
"<class 'valjean.eponine.browser.Browser'>, (Content items: ..., \
Index: ...)"
'''
[docs] def __init__(self, content, data_key='results', global_vars=None):
self.content = [r.copy() for r in content]
self.data_key = data_key
self.index = self._build_index()
LOGGER.debug("Index: %s", self.index)
self.globals = (global_vars.copy() if isinstance(global_vars, dict)
else {})
[docs] def __eq__(self, other):
return (self.content == other.content
and self.data_key == other.data_key
and self.globals == other.globals)
[docs] def __ne__(self, other):
return not self == other
def _build_index(self):
'''Build index from all content elements in the list.
Keys of the sets are keywords used to describe the items and/or the
scores (if flat case).
:param str data_key: key in list of content items corresponding to
results or data
:returns: :class:`Index`
'''
index = Index()
for ielt, elt in enumerate(self.content):
elt['index'] = ielt
for key in elt:
if key != self.data_key:
index[key][elt[key]].add(ielt)
return index
[docs] def __contains__(self, key):
if key in self.index:
return True
return False
[docs] def __len__(self):
return len(self.content)
[docs] def is_empty(self):
'''Check if the Browser is empty or not.
Empty meaning no elements in content AND no globals.
'''
return not self.content and not self.globals
[docs] def merge(self, other):
'''Merge two browsers.
This method merge 2 browsers: the *other* one appears then at the
end of the *self* one. Global variables are also merged. The new index
correspond to the merged case.
:param Browser other: another browser
:rtype: Browser
'''
if self.data_key != other.data_key:
raise ValueError('Same data_key is required to merge Browsers')
if self.globals != other.globals:
LOGGER.debug('globals will be updated with other values')
new_glob = self.globals.copy()
new_glob.update(other.globals)
new_content = self.content + other.content
LOGGER.debug("Nb self items (%d) + Nb other items (%d) = %d",
len(self), len(other), len(new_content))
return Browser(new_content, data_key=self.data_key,
global_vars=new_glob)
[docs] def keys(self):
'''Get the available keys in the index (so in the items list). As
usual it returns a generator.
'''
return tuple(self.index.keys())
[docs] def available_values(self, key):
'''Get the available keys in the *second* level, i.e. under the given
external one.
:param str key: 'external' key (from outer defaultdict)
:returns: generator with corresponding keys (or empty one)
'''
if key in self:
return tuple(self.index[key].keys())
return ()
[docs] def __str__(self):
cls_name = self.__class__.__name__
return (f"{cls_name} object -> "
f"Number of content items: {len(self.content)}, "
f"data key: {self.data_key!r}, "
f"available metadata keys: {sorted(self.keys())}\n"
f"{'':>{len(cls_name)}} "
f"-> Number of globals: {len(self.globals)}")
[docs] def __repr__(self):
return (f"{self.__class__}, (Content items: {self.content!r}, "
f"Index: {self.index!r})")
def _filter_items_id_by(self, **kwargs):
'''Selection of content items indices according to kwargs criteria.
:param \\**\\kwargs: keyword arguments to specify the required
response. More than one are allowed.
:return: set of ids
:rtype: set(int)
'''
itemids = set(range(len(self.content)))
for kwd, kwarg in kwargs.items():
if kwd not in self.index:
LOGGER.warning("%s not a valid key. Possible ones are %s",
kwd, sorted(self.keys()))
return set()
if kwarg not in self.index[kwd]:
LOGGER.warning("%s is not a valid %s", kwarg, kwd)
return set()
itemids = itemids & self.index[kwd][kwarg]
if not itemids:
LOGGER.warning("Wrong selection, item might be not present. "
"Also check if requirements are consistent.")
return set()
return itemids
def _filter_index_by(self, **kwargs):
'''Get index corresponding to selection given thanks to keyword
arguments.
:param \\**\\kwargs: keyword arguments to specify the required item.
More than one are allowed.
:returns: :class:`Index` (stripped from useless keys)
'''
itemids = self._filter_items_id_by(**kwargs)
return self.index.keep_only(itemids)
[docs] def filter_by(self, include=(), exclude=(), **kwargs):
'''Get a Browser corresponding to selection from keyword
arguments.
:param \\**\\kwargs: keyword arguments to specify the required item.
More than one are allowed.
:param tuple(str) include: metadata keys required in the content items
but for which the value is not necessarly known
:param tuple(str) exclude: metadata that should not be present in the
items and for which the value is not necessarly known
:returns: :class:`Browser` (subset of the default one, corresponding to
the selection)
'''
LOGGER.debug("in select_by, kwargs=%s", kwargs)
sincl, sexcl = set(include), set(exclude)
respids = self._filter_items_id_by(**kwargs)
lresp = [self.content[i] for i in sorted(respids)
if sincl.issubset(self.content[i])
and not sexcl.intersection(self.content[i])]
sub_br = Browser(lresp, global_vars=self.globals)
return sub_br
[docs] def select_by(self, *, include=(), exclude=(), **kwargs):
'''Get an item or the list of items from content corresponding to
selection from keyword arguments.
:param \\**\\kwargs: keyword arguments to specify the required items.
More than one are allowed.
:param tuple(str) include: metadata keys required in the items but for
which the value is not necessarly known
:param tuple(str) exclude: metadata that should not be present in the
items and for which the value is not necessarly known
:raises NoItemBrowserError: if no item corresponds to the selection
:raises TooManyItemsBrowserError: if more than one item corresponds to
the provided keywords
:rtype: dict
'''
respids = self._filter_items_id_by(**kwargs)
sincl, sexcl = set(include), set(exclude)
litems = [self.content[i] for i in sorted(respids)
if sincl.issubset(self.content[i])
and not sexcl.intersection(self.content[i])]
if not litems:
raise NoItemBrowserError("No item corresponding to the selection.")
if len(litems) > 1:
raise TooManyItemsBrowserError(
"Several content items correspond to your choice, "
"please refine your selection using additional keywords")
return litems[0]
[docs]class TooManyItemsBrowserError(LookupError):
'''Error to :class:`Browser` when too many items correspond to the
requested selection.
'''
[docs]class NoItemBrowserError(LookupError):
'''Error to :class:`Browser` when no item corresponds to the selection.'''