CSV source section¶
A CSV source pipeline section lets you create pipeline items from CSV files.
The CSV source section blueprint name is
collective.transmogrifier.sections.csvsource
.
A CSV source section will load the CSV file named in the filename
option or the CSV file named in an item key using the key
option,
and will yield an item for each line in the CSV file. It’ll use the first line
of the CSV file to determine what keys to use, or you can specify a
fieldnames
option to specify the key names.
The filename
option may be an absolute path, or a package reference, e.g.
my.package:foo/bar.csv
.
By default the CSV file is assumed to use the Excel CSV dialect, but you can
specify any dialect supported by the python csv module if you specify it with
the dialect
option. You can also specify `fmtparams`_ using
options that start with fmtparam-
.
>>> import os
>>> from collective.transmogrifier import tests
>>> csvsource = """
... [transmogrifier]
... pipeline =
... csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = {}/csvsource.csv
...
... [logger]
... blueprint = collective.transmogrifier.sections.logger
... name = logger
... level = INFO
... """.format(os.path.dirname(tests.__file__))
>>> registerConfig('collective.transmogrifier.sections.tests.csvsource.file',
... csvsource)
>>> transmogrifier('collective.transmogrifier.sections.tests.csvsource.file')
>>> print(handler)
logger INFO
{'bar': 'first-bar', 'baz': 'first-baz', 'foo': 'first-foo'}
logger INFO
{'bar': 'second-bar', 'baz': 'second-baz', 'foo': 'second-foo'}
The CSV file column field names can also be specified.
>>> handler.clear()
>>> transmogrifier('collective.transmogrifier.sections.tests.csvsource.file',
... csvsource=dict(fieldnames='monty spam eggs'))
>>> print(handler)
logger INFO
{'eggs': 'baz', 'monty': 'foo', 'spam': 'bar'}
logger INFO
{'eggs': 'first-baz', 'monty': 'first-foo', 'spam': 'first-bar'}
logger INFO
{'eggs': 'second-baz', 'monty': 'second-foo', 'spam': 'second-bar'}
Here is the same example, loading a file from a package instead:
>>> csvsource = """
... [transmogrifier]
... pipeline =
... csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = collective.transmogrifier.tests:sample.csv
...
... [logger]
... blueprint = collective.transmogrifier.sections.logger
... name = logger
... level = INFO
... """
>>> registerConfig('collective.transmogrifier.sections.tests.csvsource.package',
... csvsource)
>>> handler.clear()
>>> transmogrifier('collective.transmogrifier.sections.tests.csvsource.package')
>>> print(handler)
logger INFO
{'bar': 'first-bar', 'baz': 'first-baz', 'foo': 'first-foo'}
logger INFO
{'_csvsource_rest': ['corge', 'grault'],
'bar': 'second-bar',
'baz': 'second-baz',
'foo': 'second-foo'}
We can also load a file from a GS import context:
>>> from collective.transmogrifier.transmogrifier import Transmogrifier
>>> from collective.transmogrifier.genericsetup import IMPORT_CONTEXT
>>> from zope.annotation.interfaces import IAnnotations
>>> class FakeImportContext(object):
... def __init__(self, subdir, filename, contents):
... self.filename = filename
... self.subdir = subdir
... self.contents = contents
... def readDataFile(self, filename, subdir=None):
... if subdir is None and self.subdir is not None:
... return None
... if filename != self.filename:
... return None
... return self.contents
>>> csvsource = """
... [transmogrifier]
... pipeline =
... csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = importcontext:sub/dir/somefile.csv
...
... [logger]
... blueprint = collective.transmogrifier.sections.logger
... name = logger
... level = INFO
... """
>>> registerConfig('collective.transmogrifier.sections.tests.csvsource.gs',
... csvsource)
>>> handler.clear()
>>> t = Transmogrifier({})
>>> IAnnotations(t)[IMPORT_CONTEXT] = FakeImportContext('sub/dir/', 'somefile.csv',
... """animal,name
... cow,daisy
... pig,george
... duck,archibald
... """)
>>> t('collective.transmogrifier.sections.tests.csvsource.gs')
>>> print(handler)
logger INFO
{'animal': 'cow', 'name': 'daisy'}
logger INFO
{'animal': 'pig', 'name': 'george'}
logger INFO
{'animal': 'duck', 'name': 'archibald'}
Import contexts can be chunked, and that’s okay:
>>> from io import StringIO
>>> class FakeChunkedImportContext(object):
... def __init__(self, subdir, filename, contents):
... self.filename = filename
... self.contents = contents
... def openDataFile(self, filename, subdir=None):
... if subdir is None and self.subdir is not None:
... return None
... if filename != self.filename:
... return None
... return StringIO(self.contents)
>>> handler.clear()
>>> t = Transmogrifier({})
>>> IAnnotations(t)[IMPORT_CONTEXT] = FakeChunkedImportContext(None, 'somefile.csv',
... """animal,name
... fish,wanda
... """)
>>> t('collective.transmogrifier.sections.tests.csvsource.gs')
>>> print(handler)
logger INFO
{'animal': 'fish', 'name': 'wanda'}
Attempting to load a nonexistant file won’t do anything:
>>> handler.clear()
>>> t = Transmogrifier({})
>>> IAnnotations(t)[IMPORT_CONTEXT] = FakeImportContext(None, 'someotherfile.csv',
... """animal,name
... cow,daisy
... pig,george
... duck,archibald
... """)
>>> t('collective.transmogrifier.sections.tests.csvsource.gs')
>>> print(handler)
Not having an import context around will also find nothing:
>>> handler.clear()
>>> t = Transmogrifier({})
>>> t('collective.transmogrifier.sections.tests.csvsource.gs')
>>> print(handler)
The file can also be taken from a source item’s key. A key can also be specified for rows that have more values than the fieldnames.
>>> csvsource = """
... [transmogrifier]
... include = collective.transmogrifier.sections.tests.csvsource.package
... pipeline =
... csvsource
... filename
... item-csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = collective.transmogrifier.tests:keysource.csv
...
... [filename]
... blueprint = collective.transmogrifier.sections.inserter
... key = string:_item-csvsource
... condition = exists:item/_item-csvsource
... value = python:modules['os.path'].join(modules['os.path'].dirname(
... modules['collective.transmogrifier.tests'].__file__),
... item['_item-csvsource'])
...
... [item-csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... restkey = _args
... row-key = string:_csvsource
...
... """
>>> registerConfig('collective.transmogrifier.sections.tests.csvsource.key',
... csvsource)
>>> handler.clear()
>>> transmogrifier('collective.transmogrifier.sections.tests.csvsource.key')
>>> print(handler)
logger INFO
{'_item-csvsource': '.../collective/transmogrifier/tests/sample.csv'}
logger INFO
{'_csvsource': '.../collective/transmogrifier/tests/sample.csv',
'bar': 'first-bar',
'baz': 'first-baz',
'foo': 'first-foo'}
logger INFO
{'_args': ['corge', 'grault'],
'_csvsource': '.../collective/transmogrifier/tests/sample.csv',
'bar': 'second-bar',
'baz': 'second-baz',
'foo': 'second-foo'}
The fmtparam-
expressions have access to the following:
|
the `fmtparam`_ attribute |
|
the transmogrifier |
|
the name of the inserter section |
|
the inserter options |
|
sys.modules |
The row-key
and row-value
expressions have access to the following:
|
the pipeline item to be yielded from this CSV row |
|
the pipeline item the CSV filename was taken from |
|
the transmogrifier |
|
the name of the inserter section |
|
the inserter options |
|
sys.modules |
|
(only for the value and condition expressions) the key being inserted |