forked from liam2/liam2
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathTODO
265 lines (221 loc) · 11.5 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
#========================================================#
# Most TODO items have been converted to GitHub issues ! #
# See: https://github.com/liam2/liam2/issues #
# Only "weird items" which do not really translate well #
# to issues are left here for reference. #
#=========================================================
- pure Python mode. Something like this?
class Person(Entity):
age = Field(Int)
def ageing(self):
# self.age = self.age + 1
self.age += 1
self.agegroup = (self.age // 10) * 10
def main():
registry = EntityRegistry((Person, Household))
for e in registry:
e.load()
p = Person()
for period in range(start_period, periods):
p.ageing()
s = Simulation((Person, Household), 2010, 10)
s.run()
- upgrade carray to blz
- when the move away from the conditional context is done,
I should make macros available via links. Maybe also merge
entity.all_symbols and entity.variables?
- replace breakpoint(period) by breakpoint(condition)
- add option to insert a conditional breakpoint for asserts (ie failed
assert start the console)
- integrate with Ipython interactive func: it rocks!
eg. interactive(func, arg0=rangeofvalues_for_that_arg, ...) and
it produces sliders/checkboxes/... for each argument.
see http://vimeo.com/79832657 (from 30 minutes onward)
projects to keep an eye on
==========================
misc
----
- pandas: I could kill half of my code and it would bring for free many
of the features I want to develop (and more), but this would need such a
major refactoring that I am uneager to start it :)
- Blaze (http://blaze.pydata.org/) is a generalisation of ndarray with more
data layouts -- potentially distributed), a delayed-evaluation system
(like I have done in liam2) and a built-in code generation framework based
on LLVM (using Numba I think). Documentation is a bit scarce at this point,
but from my quick glance at it, it seems to be exactly what I have been
dreaming for but would have been unable to achieve by myself.
- cytoolz: groupby, reduceby, join, ... composable and optimized "functional"
operations. Many are "streaming" (or partially streaming).
- Odin (equivalent of Blaze from Enthought)
https://github.com/enthought/distarray
- mosaik: https://mosaik.offis.de/ integrate multiple simulators
speed
-----
- Julia (http://julialang.org): a language specifically tailored for
scientific computing. It seem to have a very interesting feature set
(optional typing, fast resulting code, builtin parallelisation primitives,
...) but a somewhat ugly syntax. What I don't know though is whether its
"stdlib" is complete enough for our needs (csv input, hdf output, plotting,
...).
- Numba: early tests seem very promising.
- Parakeet: jit for a subset of python. very similar to numba, but more limited
in scope
- SymPy. They use mpmath and I wonder how fast it is.
- bottleneck (http://pypi.python.org/pypi/Bottleneck)
nansum, nanmin, nanmax, nanmean, nanstd
it will probably be integrated back into numpy/scipy at some point but it
will probably take a while, so let's use this in the mean time
- corepy
- copperhead (targets NVIDIA GPUs & multi core CPUs)
http://copperhead.github.io/
- Theano (http://deeplearning.net/software/theano/)
it seems to be a PITA to install though (no pre-built packages, needs a
compiler, BLAS, etc...). It is only integrated with CUDA (ie NVidia)
- PyOpenCL: OpenCL seem pretty powerful and more standardised than CUDA
(also works on some CPUs, not only GPUs).
- Clyther: translates python/numpy code to OpenCL
- pythran: python to C++ compiler
- shedskin: python to C++ compiler
- nuitka: python to C++ compiler
- scipy.weave.blitz. It seems a bit limited in functionality but it uses
blitz++ (http://www.oonumerics.org/blitz/) internally which seems to contain
all I need and more, so adding the missing functionality might not be too
hard. It seems to be really slow to compile though, so it'll probably be
only worth it when using very large arrays.
- ORC (but I would need to create python bindings for it)
http://code.entropywave.com/projects/orc/
or any of the other C JIT referenced in the external links at:
http://en.wikipedia.org/wiki/Just-in-time_compilation
- Armadillo (http://arma.sourceforge.net/): approximately numpy for C++
there are Python bindings though.
storage
-------
- SciDB: http://www.scidb.org
- pydap (http://www.pydap.org)
access many kinds of data formats using a uniform interface
GUI
---
- assess whether we can get inspiration (or even code) from spyke viewer:
http://neuralensemble.blogspot.be/2012/11/spyke-viewer-020-released.html
- try Kivy (kivy.org)
? glumpy for "live" visualisation during the simulation
undecided ideas
===============
The following is a list of improvements ideas I had at some point that I am
now unsure they are worth implementing.
? try alternate algorithm for RemoveIndividuals: simply reindex, like I do it
elsewhere. It might cause a problem if the last ids (newly borns) die.
it seems to be a tad slower in fact. However, it might be a better starting
point to translate in C if that is ever necessary.
? make numexpr able to compute several expressions at the same time
eg ne.evalutate(["a + (a * b)", "(a * b) * 5", {'a': [1, 2], 'b': [3, 4]})
>> [3, 10], [15, 40]
>> this would be limited to expressions which do not "recompute" any of
>> the variable used
>> if it is the only factorization, it would limit some of it, because if
>> some sub expression is common between one expression of the group and
>> one which is not in the group, there wouldn't be any factorization
Ideally, I could feed targets too, so that numexpr would be able to do
the whole thing by itself:
eg ne.evalutate([('c', 'a + (a * b)'),
('d', '(a * b) * c'),
('a', 'd + 1'),
('c', 'a * 2')], {'a': [1, 2], 'b': [3, 4]})
>> {'a': ..., 'c': ..., 'd': ...}
? make specifying output field types optional (I can compute them):
listing the fields is technically enough, though having the type make things
easier and we can have some type checks
? make specifying input fields optional (we could still check
whether needed fields are present or not (by analysing the simulation
code), but we couldn't check that the type of fields is "correct". We would
also need to postpone "compiling" to numexpr (which uses types) to after
the input data file is loaded (or at least known and analysed).
I like the idea, because it brings us closer to Python but it would be much
work for only a small benefit.
? use stata regressions directly
* would introduce some kind of dependency on Stata which is not a good thing
IMO -> convert them first?
? importer: import all columns, except a few
? somehow make "chenard" algorithm (used in align_abs(link=)) available in
standard alignment. I would have to make the iterator run on individuals
directly instead of hh. The code would be much simpler because we wouldn't
need num_unfillable, num_excedent, etc...
? two different values for missing links:
* linked to none/no individual = -1
* missing value = -2
A potential problem (even if the distinction has value -- I am unsure at
this point) is that modellers will probably not have the additional
information in their input data.
? temporaries which are carried from one period to another
ex: "progressive" tsum, or "progessive" duration:
#minr61_c: "duration(minr61_d)"
minr61_c: "if(minr61_d, minr61_c + 1, minr61_c)"
we do not (necessarily) need to store the value for each year (unless we
want to inspect it after-the fact)
=> problem for the first period
? m2o link in aggregate link
eg, hh_nch: househould.persons.count()
Another spelling which (should) already work(s):
hh_nch: household.get(persons.count())
? a: "where(b, c[, d])" -> d implicitly = a
? specific choice method for boolean: boolchoice(0.3)
instead of: choice([True, False], [0.3, 1-0.3])
? allow processes to be run only at specific periodicity: eg. some monthly,
some quarterly, and some annually
? store alignment tables in hdf input & output files
* it seems like a good idea to store them in the output file, so that we can
know for sure what data was used to generate the output
* however requiring alignment data to be in the input hdf file, might be
less convenient for the modeller who might have his source data in csv
and would thus require an extra import step.
* on the other hand, having all input data in one file is convenient if they
want to pass their simulation around
? explicit "fill missing values" so that you can do operations between vectors
of a past period.
? OR, we could "timestamp" each expression and convert only when using both
old and current ones in the same expression
? move expression optimisation/simplification to numexpr
abandoned ideas
===============
The following is a list of improvements ideas I had at some point that I do
not want to implement anymore because they are either plain bad ideas or would
take too much time to implement for very little benefit.
? allow setting temporary variables in new()
? at the start of each period, wipe out the array (set all to missing)
=> need to use lag explicitly
? implicit "dead" field. Not sure it is really necessary, as some simulations
might not involve the creation or death of individuals. And some people
might want to name it differently: alive, ...
? implicit age: I don't like the idea but Geert would like to have it.
? automatically add dependencies to simulation? (eg add person.difficult_match
when person.partner_id is computed)
? allow lag to use temporary variables. > it's better to only allow macros
? "new" action is limited because it can only have one parent, it might be
interesting to have an arbitrary number of them. eg:
newbirth: "new('person', filters={'mother': to_give_birth & ~male,
'father': to_give_birth & male},
mother_id=mother.id,
father_id=father.id,
male=logit(0.0),
partner_id=-1)"
however, this might be useless/impractical, as eg, the father filter above
is wrong, and if an individual is created from several parents, they will
usually (always?) need to be matched beforehand, so one or several fields
or links will be available to get to the other parents. Consequently, this
syntax might be more appropriate:
newbirth: "new('person', filter={'mother': to_give_birth},
links={'father': mother.partner},
mother_id=mother.id,
father_id=father.id,
male=logit(0.0),
partner_id=-1)"
... but in that case, the current syntax works as well.
? Q: use validictory instead of my custom validator?
http://readthedocs.org/docs/validictory/
A: probably not. I prefer my own solution, as the syntax is more concise and
(dare I say) more pythonic. It could be extended if needed:
- Or(a, b c) to have both enumeration of values or alternate syntaxes:
Or(str, {'type': str, 'initialdata': bool})
Or('many2one', 'one2many')
- List(x, y, z, option1=value1) if I need extra options for lists
- Dict({}, option1=value1) if I need extra options for dicts