| Age | Commit message (Collapse) | Author | 
|---|
|  | algorithms. | 
|  |  | 
|  |  | 
|  | In this implementation, the BVH is represented in a different way.
Unlike the standard implementation, this BVH is constrained to be very
uniform.  Every node has the same number of children, B, except the
leaf layer, where each leaf node wraps exactly one triangle.  All
leaves are at the same depth, and dummy nodes are inserted into the
tree in order to meet the requirement that all nodes have B children.
By requiring each triangle be wrapped by a bounding box, we increase
the number of nodes required to represent a geometry quite
dramatically.  To compensate, we cut the storage requirement per node
by storing the following per-node properties:
1) 6 * 16 bits: x, y, z upper and lower bounds in a 16-bit fixed-point
representation.  The Geometry struct contains the global offset and
scale factor required to transform the fixed point values back into
floats.  The bounding values are rounded to ensure they still fully
contain the triangle vertices, even though the vertices are
represented with greater precision than the bounding boxes.
2) 1 bit: Leaf bit.  If not set, this node has child nodes, otherwise
this node is a leaf node wrapping a single triangle.
3) 31 bits: The index of the first child of this node in the node
list.  All other children of the node immediately follow the first
child, and there are always exactly B of them.  If the leaf bit is
set, then the child index refers to the triangle list, rather than the
node list.
A dummy node is defined to have the x lower bound = x upper bound = 0.
All dummy children of a node will be consecutive at the end of the
list of children, so once the first dummy is encountered, the
remaining children can be skipped.
To build this BVH, we use the GPU for assistance.  One kernel computes
the boundaries of the leaf nodes and the Morton codes from the
triangle list.  Since GPU storage is severely strained for large
detectors, we then pull back these arrays to the CPU, sort the leaf
nodes in Morton-order, delete the GPU arrays, and then allocate the
full node list on the GPU.  The BVH is then built in-place on the GPU
layer by layer, moving bottom to top.
The BVH for LBNE can be constructed in about 20 seconds in this
fashion, which is actually faster than the standard BVH can be
deserialized from disk.
Unfortunately, the benchmark geometry requires so many nodes (due to
one leaf per triangle), that the BVH leaves almost no room on the GPU
for the actual triangles or vertices.  As a result, the current code
actually stores these arrays on the CPU and *maps them into the GPU
address space*.  That sounds kind of crazy, but given the bounding of
each triangle in a box, the number of triangles that actually need to
be fetched over the PCI-E bus is fairly low.  For a branching factor
of 3, the BVH needs to test ~400 nodes and only 3-7 triangles to
project a single ray.  If the off-board storage of the triangles and
vertices can be preserved, then it might be possible in the future to
squeeze LBNE onto a standard 1.5 GB GTX 580.
The current mesh intersection code has some severe performance issues
at the moment.  It is about 5-10x slower than the standard BVH code
for reasons that are not immediately clear, although there are plenty
of candidates.
Some investigation is in order, but I will probably just skip right
over to the ultimate goal of this effort: parallel BVH search.
Instead of assigning one ray per thread, I would like to assign one
ray per 32-thread block, and use the 32-threads to cooperatively
search the BVH.  This requires a BVH with branch factor of 32, and a
very regular layout.  That wide of a tree requires more nodes to be
evaluated (~1200 instead of 400), but will hopefully have more
streamlined memory access patterns and less warp divergence.
It is not clear whether this experiment will actually speed things
up... | 
|  |  | 
|  | global import of RootReader in the chroma.camera module.
Bringing in ROOT adds 14 seconds to the import time, which really
sucks if you don't need it. | 
|  | function.  This makes it easier to experiment with external BVH construction implementations, which only need the flat list of triangles. | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | scatter" and a "forced no-scatter" pass.
Since we want to include contributions from two populations of weighted
photons, we have to break up the DAQ simulation into three functions:
 * begin_acquire()
 * acquire()
 * end_acquire()
The first function resets the channel states, the second function
accumulates photoelectrons (and can be called multiple times), and the
last function returns the hit information.  A global weight has also
been added to the DAQ simulation if a particular set of weighted
photons need to have an overall penalty.
The forced scattering pass can be repeated many times on the same
photons (with the photons individually deweighted to compensate).
This reduces the variance on the final likelihoods quite a bit. | 
|  | state (hit or not), rather than a floor on the hit probability.
A channel that is impossible to hit should have zero probability in
the likelihood and not be hit in the actual data.  Before this change,
that channel would be forced to have a hit probability of 0.5 /
ntotal, which is wrong.  We only need to ensure the probability of the
observed state of the channel is not zero so that the log() function
is defined. | 
|  | same angle for greater efficiency. | 
|  | step during photon propagation. | 
|  |  | 
|  |  | 
|  |  | 
|  | For consistence, weights must be less than or equal to one at all
times.  When weight calculation is enabled by the likelihood
calculator, photons are prevented from being absorbed, and instead
their weight is decreased to reflect their probability of survival.
Once the photon's weight drops below a given threshold (0.01 for now),
the weighting procedure is stopped and the photon can be extinguished.
With weighting enabled, the detection efficiency of surfaces is also
applied to the weight, and the photon terminated with the DETECT bit
set in the history.  This is not completely accurate, as a photon
could pass through the surface, reflect, and reintersect the surface
later (or some other PMT) and be detected.  As a result, weighting
will slightly underestimate PMT efficiency compared to the true Monte
Carlo.  This is not intrinsic to the weighting procedure, but only
comes about because of the inclusion of detection efficiency into the
weight.
Without the detection efficiency included, weighting cuts in half the
number of evaluations required to achieve a given likelihood
uncertainty (at least for the hit probabilities).  Add in the
detection efficiency, and that factor becomes 1/5 or 1/6! | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | the mathematics convention for the angle names.  Flipping theta and phi back to their correct meaning. | 
|  | expecting. | 
|  | you will be forking processes at multiple locations.  This killed one of the unit tests. | 
|  |  | 
|  | remove requirement to use numpy >= 1.6 | 
|  | Press "s" and average time, charge and hit probability are computed
for all of the events in the input file.  Then you can cycle through
the different values using the "." key, just as you can for single
events.  You can flip between sum and regular modes, and the sum
calculation is only done the first time. | 
|  | Now you can do:
  reader = RootReader('file.root')
  for ev in reader:
      # Do stuff | 
|  | in event viewer | 
|  |  | 
|  |  | 
|  |  | 
|  | to speed up likelihood evaluation.
When generating a likelihood, the DAQ can be run many times in parallel
by the GPU, creating a large block of channel hit information in memory.
The PDF accumulator processes that entire block in two passes:
  * First update the channel hit count, and the count of channel hits
    falling into the bin around the channel hit time being evaluated.
    Add any channel hits that should also be included in the n-th
    nearest neighbor calculation to a channel-specific work queue.
  * Process all the work queues for each channel and update the
    list of nearest neighbors.
This is hugely faster than what we were doing before.  Kernel
estimation (or some kind of orthogonal function expansion of the PDF)
should be better ultimately, but for now the nearest neighbor approach
to PDF estimation seems to be working the best. | 
|  | that all have a particular interaction process code set.  Handy for selection just the detected photons from a large list of photons. | 
|  |  | 
|  | still look off, but this is an improvement. | 
|  |  | 
|  |  | 
|  | separate function for easier debugging of channel-level likelihood behavior. | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | likelihood | 
|  |  |