chroma - Chroma is a high performance optical photon simulation for particle physics detectors

Age	Commit message (Collapse)	Author
2021-05-09	Pull memoization decorator out to tools module.	Stan Seibert

2021-05-09	Simple BVH construction passing the "smoke test"	Stan Seibert

2021-05-09	Simple BVH generator using new infrastructure	Stan Seibert

2021-05-09	Convert bvh module into a package directory.	Stan Seibert

2021-05-09	BVH and BVHLayer classes with unittests.	Stan Seibert
	These classes will be the data types used by the new BVH generation code.
2021-05-09	Basic BVH cache. Further implementation requires creation of BVH class.	Stan Seibert

2021-05-09	Cache class for managing the geometry cache on disk.	Stan Seibert
	The storage format is changing relative to the old format, so all geometry files will be saved in the ~/.chroma/geo directory. For now, the format is just a simple pickle. We know this is not optimal for space or speed, but the Geometry class will be changing soon, and we can optimize further after that. This Cache class will also soon manage the separate BVH cache.
2021-05-09	Add md5() hash method to chroma.geometry.Mesh	Stan Seibert

2021-05-09	Pull BVH calculation out of GPUGeometry class and make the world scale ↵	Stan Seibert
	factor the same in all dimensions so that distance and area calculations are easy to do.
2021-05-09	Save offset of each layer in the CPU-computed BVH for use when comparing BVH ↵	Stan Seibert
	algorithms.
2021-05-09	Replace the use of pointers in the intersect_mesh with array indices.	Stan Seibert

2021-05-09	Saving the material codes and colors in page-locked host memory is unnecessary.	Stan Seibert

2021-05-09	First steps toward an alternative BVH implementation.	Stan Seibert
	In this implementation, the BVH is represented in a different way. Unlike the standard implementation, this BVH is constrained to be very uniform. Every node has the same number of children, B, except the leaf layer, where each leaf node wraps exactly one triangle. All leaves are at the same depth, and dummy nodes are inserted into the tree in order to meet the requirement that all nodes have B children. By requiring each triangle be wrapped by a bounding box, we increase the number of nodes required to represent a geometry quite dramatically. To compensate, we cut the storage requirement per node by storing the following per-node properties: 1) 6 * 16 bits: x, y, z upper and lower bounds in a 16-bit fixed-point representation. The Geometry struct contains the global offset and scale factor required to transform the fixed point values back into floats. The bounding values are rounded to ensure they still fully contain the triangle vertices, even though the vertices are represented with greater precision than the bounding boxes. 2) 1 bit: Leaf bit. If not set, this node has child nodes, otherwise this node is a leaf node wrapping a single triangle. 3) 31 bits: The index of the first child of this node in the node list. All other children of the node immediately follow the first child, and there are always exactly B of them. If the leaf bit is set, then the child index refers to the triangle list, rather than the node list. A dummy node is defined to have the x lower bound = x upper bound = 0. All dummy children of a node will be consecutive at the end of the list of children, so once the first dummy is encountered, the remaining children can be skipped. To build this BVH, we use the GPU for assistance. One kernel computes the boundaries of the leaf nodes and the Morton codes from the triangle list. Since GPU storage is severely strained for large detectors, we then pull back these arrays to the CPU, sort the leaf nodes in Morton-order, delete the GPU arrays, and then allocate the full node list on the GPU. The BVH is then built in-place on the GPU layer by layer, moving bottom to top. The BVH for LBNE can be constructed in about 20 seconds in this fashion, which is actually faster than the standard BVH can be deserialized from disk. Unfortunately, the benchmark geometry requires so many nodes (due to one leaf per triangle), that the BVH leaves almost no room on the GPU for the actual triangles or vertices. As a result, the current code actually stores these arrays on the CPU and maps them into the GPU address space. That sounds kind of crazy, but given the bounding of each triangle in a box, the number of triangles that actually need to be fetched over the PCI-E bus is fairly low. For a branching factor of 3, the BVH needs to test ~400 nodes and only 3-7 triangles to project a single ray. If the off-board storage of the triangles and vertices can be preserved, then it might be possible in the future to squeeze LBNE onto a standard 1.5 GB GTX 580. The current mesh intersection code has some severe performance issues at the moment. It is about 5-10x slower than the standard BVH code for reasons that are not immediately clear, although there are plenty of candidates. Some investigation is in order, but I will probably just skip right over to the ultimate goal of this effort: parallel BVH search. Instead of assigning one ray per thread, I would like to assign one ray per 32-thread block, and use the 32-threads to cooperatively search the BVH. This requires a BVH with branch factor of 32, and a very regular layout. That wide of a tree requires more nodes to be evaluated (~1200 instead of 400), but will hopefully have more streamlined memory access patterns and less warp divergence. It is not clear whether this experiment will actually speed things up...
2021-05-09	Improve identification of CUDA vector types in struct builder	Stan Seibert

2021-05-09	Remove import of chroma.io.root in the __init__.py module and also the	Stan Seibert
	global import of RootReader in the chroma.camera module. Bringing in ROOT adds 14 seconds to the import time, which really sucks if you don't need it.
2021-05-09	Break out the construction of the geometry mesh into a separate flatten() ↵	Stan Seibert
	function. This makes it easier to experiment with external BVH construction implementations, which only need the flat list of triangles.
2021-05-09	Allow make_gpu_struct to pack structs containing the CUDA vector types.	Stan Seibert

2021-05-09	Put time PDF back into likelihood calculation	Stan Seibert

2021-05-09	Reduce number of scattering passes by default in likelihood calculation	Stan Seibert

2021-05-09	Put back accumulation of time PDF information in PDF calculator	Stan Seibert

2021-05-09	Fix bug in stepping through photons during multi-DAQ simulation.	Stan Seibert

2021-05-09	Lower the weight threshold cutoff to propagate even more improbable photons	Stan Seibert

2021-05-09	Split photon propagation in the likelihood calculation into a "forced	Stan Seibert
	scatter" and a "forced no-scatter" pass. Since we want to include contributions from two populations of weighted photons, we have to break up the DAQ simulation into three functions: * begin_acquire() * acquire() * end_acquire() The first function resets the channel states, the second function accumulates photoelectrons (and can be called multiple times), and the last function returns the hit information. A global weight has also been added to the DAQ simulation if a particular set of weighted photons need to have an overall penalty. The forced scattering pass can be repeated many times on the same photons (with the photons individually deweighted to compensate). This reduces the variance on the final likelihoods quite a bit.
2021-05-09	Apply a floor to the probability of each channel being in the observed	Stan Seibert
	state (hit or not), rather than a floor on the hit probability. A channel that is impossible to hit should have zero probability in the likelihood and not be hit in the actual data. Before this change, that channel would be forced to have a hit probability of 0.5 / ntotal, which is wrong. We only need to ensure the probability of the observed state of the channel is not zero so that the log() function is defined.
2021-05-09	Use the sincosf() function when computing both the sin() and cos() of the ↵	Stan Seibert
	same angle for greater efficiency.
2021-05-09	Allow for option of forced scattering or forced non-scattering on the first ↵	Stan Seibert
	step during photon propagation.
2021-05-09	Eliminate spurious commas in definition of photon history bits	Stan Seibert

2021-05-09	Skip photon propagation queue if we are propagating weighted photons	Stan Seibert

2021-05-09	Use bigger blocks for multi-DAQ calculation	Stan Seibert

2021-05-09	Add optional weight to each photon being propagated.	Stan Seibert
	For consistence, weights must be less than or equal to one at all times. When weight calculation is enabled by the likelihood calculator, photons are prevented from being absorbed, and instead their weight is decreased to reflect their probability of survival. Once the photon's weight drops below a given threshold (0.01 for now), the weighting procedure is stopped and the photon can be extinguished. With weighting enabled, the detection efficiency of surfaces is also applied to the weight, and the photon terminated with the DETECT bit set in the history. This is not completely accurate, as a photon could pass through the surface, reflect, and reintersect the surface later (or some other PMT) and be detected. As a result, weighting will slightly underestimate PMT efficiency compared to the true Monte Carlo. This is not intrinsic to the weighting procedure, but only comes about because of the inclusion of detection efficiency into the weight. Without the detection efficiency included, weighting cuts in half the number of evaluations required to achieve a given likelihood uncertainty (at least for the hit probabilities). Add in the detection efficiency, and that factor becomes 1/5 or 1/6!
2021-05-09	Disable time contribution to likelihood for weighting tests.	Stan Seibert

2021-05-09	More documentation of event data structure.	Stan Seibert

2021-05-09	Docstring for chroma.event.Photons	Stan Seibert

2021-05-09	Fix ncorrect argument order to arctan2	Stan Seibert

2021-05-09	Argh! Someone used spherical polar coordinates in the pi0 generator with ↵	Stan Seibert
	the mathematics convention for the angle names. Flipping theta and phi back to their correct meaning.
2021-05-09	Cast channel ID array to uint32, as that is technically what the C++ code is ↵	Stan Seibert
	expecting.
2021-05-09	Do not use the zmq.Context.instance() method to obtain a ZeroMQ context when ↵	Stan Seibert
	you will be forking processes at multiple locations. This killed one of the unit tests.
2021-05-09	Cast numpy sum to Python int to make PyROOT happy.	Stan Seibert

2021-05-09	Replace all uses of numpy.count_nonzero() with chroma.tools.count_nonzero to ↵	Stan Seibert
	remove requirement to use numpy >= 1.6
2021-05-09	Add a summing mode to the event view.	Stan Seibert
	Press "s" and average time, charge and hit probability are computed for all of the events in the input file. Then you can cycle through the different values using the "." key, just as you can for single events. You can flip between sum and regular modes, and the sum calculation is only done the first time.
2021-05-09	Make chroma.io.root.RootReader follow the Python iterator protocol.	Stan Seibert
	Now you can do: reader = RootReader('file.root') for ev in reader: # Do stuff
2021-05-09	Make the period key cycle between charge, time and hit/no-hit color display ↵	Stan Seibert
	in event viewer
2021-05-09	Fix silly bug in RootReader.jump_to()	Stan Seibert

2021-05-09	Add optional argument to pi0 gun to pick direction of one gamma in rest frame.	Stan Seibert

2021-05-09	Allow chroma to be imported on systems without pygame. Good for clusters.	Stan Seibert

2021-05-09	Major overhaul to the way that DAQ and PDF information is accumulated	Stan Seibert
	to speed up likelihood evaluation. When generating a likelihood, the DAQ can be run many times in parallel by the GPU, creating a large block of channel hit information in memory. The PDF accumulator processes that entire block in two passes: * First update the channel hit count, and the count of channel hits falling into the bin around the channel hit time being evaluated. Add any channel hits that should also be included in the n-th nearest neighbor calculation to a channel-specific work queue. * Process all the work queues for each channel and update the list of nearest neighbors. This is hugely faster than what we were doing before. Kernel estimation (or some kind of orthogonal function expansion of the PDF) should be better ultimately, but for now the nearest neighbor approach to PDF estimation seems to be working the best.
2021-05-09	Add a select function to GPUPhotons to extract a reduced list of photons ↵	Stan Seibert
	that all have a particular interaction process code set. Handy for selection just the detected photons from a large list of photons.
2021-05-09	Use global per-thread ZeroMQ context rather than make a new one.	Stan Seibert

2021-05-09	Minor fixes to doing time & charge PDFs via kernel estimation. Things	Stan Seibert
	still look off, but this is an improvement.
2021-05-09	OK, we really do need to call the garbage collector this frequently.	Stan Seibert