summaryrefslogtreecommitdiff
path: root/chroma
AgeCommit message (Collapse)Author
2021-05-09Improve startup time in simulation by letting GEANT4 processes initialize in ↵Stan Seibert
background.
2021-05-09BVH optimization to sort child nodes by area. Only has a small effect.Stan Seibert
2021-05-09Collapse chains of BVH nodes with single children.Stan Seibert
2021-05-09Fix bug in grid BVH implementation. Now half as fast as old Chroma.Stan Seibert
2021-05-09Speed up recursive grid BVH generation. Now LBNE BVH can be generated in ↵Stan Seibert
about 60-90 seconds.
2021-05-09New BVH algorithm: Recursive GridStan Seibert
This is an adaptation of the original Chroma BVH construction algorithm. The generation stage is very slow, but can be fixed.
2021-05-09Import memoize decorator directly from pytools as it is not present in the ↵Stan Seibert
old location in newer pycuda releases.
2021-05-09Implementation of "node splitting" which places children into separateStan Seibert
parent nodes if combining them would result in a parent node that is excessively large compared to the surface area of the children. This doesn't help as much as you might imagine.
2021-05-09Bugfixes to BVH traversal and generation code.Stan Seibert
2021-05-09Use degree 2 tree by defaultStan Seibert
2021-05-09Redo node format to include number of children, rather than just leaf bit.Stan Seibert
2021-05-09Add a chroma-bvh hist function that displays a ROOT histogram of theStan Seibert
areas of the BVH nodes in a particular layer of the tree.
2021-05-09Rename node_area() to node_areas() and make it return the array of node areasStan Seibert
2021-05-09Skip L1 cache when loading nodes.Stan Seibert
Node access is very irregular as each thread descends the BVH tree. Each node is only 16 bytes, so the 128 byte cache line size in the L1 cache means that a lot of useless data is often fetched. Using some embedded PTX, we can force the L1 cache to be skipped, going directly to L2. The L2 cache line is 32 bytes long, which means that both children in a binary tree will be cached at the same time. This improves the speed on the default generated binary trees, but does not help an optimized tree yet.
2021-05-09Missing return in functionStan Seibert
2021-05-09Oops, pdb line snuck into commit.Stan Seibert
2021-05-09Make Geometry.flatten() idempotent and avoid loading the BVH twice when ↵Stan Seibert
using the Camera class.
2021-05-09Fix the rendering code to use the new geometry and BVH definition.Stan Seibert
2021-05-09Speed up node intersection by 2.5x using tips from "Optimizing rayStan Seibert
tracing for CUDA" by Hannu Saransaari. The intersect_box() function has been rewritten to be much shorter and use the min() and max() functions, which map directly to hardware instructions. Additionally, the calculations inside intersect_box() have been reorganized to allow the compiler to use the combined multiply-add instruction, instead of doing a subtraction followed by a division (which is way slower).
2021-05-09Use fixed point BVH surface area instead of world coordinates for ↵Stan Seibert
consistency everywhere.
2021-05-09Increase look ahead in node_swap BVH optimization to improve surface area ↵Stan Seibert
minimization
2021-05-09Add more BVH manipulation commands:Stan Seibert
* chroma-bvh create [name] [degree] - Creates a new BVH with the specified branching degree. * chroma-bvh node_swap [name] [layer] - Optimizes a BVH layer with a "greedy, short-sighted" algorithm that swaps around nodes to minimize the surface area of the immediate parent layer. Rebuilds the tree above the modified layer when finished. Also modified the chroma-bvh stat command to print the sum of the logarithms of the areas of each layer. It seems to be a rough predictor of the simulation speed of the BVH.
2021-05-09Fix silly typos in BVH loading functionsStan Seibert
2021-05-09Do not cache the individual solids. Speeds up unpickling significantly.Stan Seibert
2021-05-09Do not save the BVH when caching the geometry.Stan Seibert
2021-05-09Make load from string method handle functions that return Mesh, Solid, or ↵Stan Seibert
Geometry.
2021-05-09Add another loader function that is useful for chroma-cam.Stan Seibert
Note that rendering is still broken by the new BVH format.
2021-05-09Implement new loader function that turns a string into a Geometry,Stan Seibert
searching through files, named geometries in the cache, and geometry creation functions. The loader function also is responsible for fetching or creating a BVH to go with the geometry. This commit also removes some code that has been replaced by the new system. Other bits will come back in future commits.
2021-05-09Pull memoization decorator out to tools module.Stan Seibert
2021-05-09Simple BVH construction passing the "smoke test"Stan Seibert
2021-05-09Simple BVH generator using new infrastructureStan Seibert
2021-05-09Convert bvh module into a package directory.Stan Seibert
2021-05-09BVH and BVHLayer classes with unittests.Stan Seibert
These classes will be the data types used by the new BVH generation code.
2021-05-09Basic BVH cache. Further implementation requires creation of BVH class.Stan Seibert
2021-05-09Cache class for managing the geometry cache on disk.Stan Seibert
The storage format is changing relative to the old format, so all geometry files will be saved in the ~/.chroma/geo directory. For now, the format is just a simple pickle. We know this is not optimal for space or speed, but the Geometry class will be changing soon, and we can optimize further after that. This Cache class will also soon manage the separate BVH cache.
2021-05-09Add md5() hash method to chroma.geometry.MeshStan Seibert
2021-05-09Pull BVH calculation out of GPUGeometry class and make the world scale ↵Stan Seibert
factor the same in all dimensions so that distance and area calculations are easy to do.
2021-05-09Save offset of each layer in the CPU-computed BVH for use when comparing BVH ↵Stan Seibert
algorithms.
2021-05-09Replace the use of pointers in the intersect_mesh with array indices.Stan Seibert
2021-05-09Saving the material codes and colors in page-locked host memory is unnecessary.Stan Seibert
2021-05-09First steps toward an alternative BVH implementation.Stan Seibert
In this implementation, the BVH is represented in a different way. Unlike the standard implementation, this BVH is constrained to be very uniform. Every node has the same number of children, B, except the leaf layer, where each leaf node wraps exactly one triangle. All leaves are at the same depth, and dummy nodes are inserted into the tree in order to meet the requirement that all nodes have B children. By requiring each triangle be wrapped by a bounding box, we increase the number of nodes required to represent a geometry quite dramatically. To compensate, we cut the storage requirement per node by storing the following per-node properties: 1) 6 * 16 bits: x, y, z upper and lower bounds in a 16-bit fixed-point representation. The Geometry struct contains the global offset and scale factor required to transform the fixed point values back into floats. The bounding values are rounded to ensure they still fully contain the triangle vertices, even though the vertices are represented with greater precision than the bounding boxes. 2) 1 bit: Leaf bit. If not set, this node has child nodes, otherwise this node is a leaf node wrapping a single triangle. 3) 31 bits: The index of the first child of this node in the node list. All other children of the node immediately follow the first child, and there are always exactly B of them. If the leaf bit is set, then the child index refers to the triangle list, rather than the node list. A dummy node is defined to have the x lower bound = x upper bound = 0. All dummy children of a node will be consecutive at the end of the list of children, so once the first dummy is encountered, the remaining children can be skipped. To build this BVH, we use the GPU for assistance. One kernel computes the boundaries of the leaf nodes and the Morton codes from the triangle list. Since GPU storage is severely strained for large detectors, we then pull back these arrays to the CPU, sort the leaf nodes in Morton-order, delete the GPU arrays, and then allocate the full node list on the GPU. The BVH is then built in-place on the GPU layer by layer, moving bottom to top. The BVH for LBNE can be constructed in about 20 seconds in this fashion, which is actually faster than the standard BVH can be deserialized from disk. Unfortunately, the benchmark geometry requires so many nodes (due to one leaf per triangle), that the BVH leaves almost no room on the GPU for the actual triangles or vertices. As a result, the current code actually stores these arrays on the CPU and *maps them into the GPU address space*. That sounds kind of crazy, but given the bounding of each triangle in a box, the number of triangles that actually need to be fetched over the PCI-E bus is fairly low. For a branching factor of 3, the BVH needs to test ~400 nodes and only 3-7 triangles to project a single ray. If the off-board storage of the triangles and vertices can be preserved, then it might be possible in the future to squeeze LBNE onto a standard 1.5 GB GTX 580. The current mesh intersection code has some severe performance issues at the moment. It is about 5-10x slower than the standard BVH code for reasons that are not immediately clear, although there are plenty of candidates. Some investigation is in order, but I will probably just skip right over to the ultimate goal of this effort: parallel BVH search. Instead of assigning one ray per thread, I would like to assign one ray per 32-thread block, and use the 32-threads to cooperatively search the BVH. This requires a BVH with branch factor of 32, and a very regular layout. That wide of a tree requires more nodes to be evaluated (~1200 instead of 400), but will hopefully have more streamlined memory access patterns and less warp divergence. It is not clear whether this experiment will actually speed things up...
2021-05-09Improve identification of CUDA vector types in struct builderStan Seibert
2021-05-09Remove import of chroma.io.root in the __init__.py module and also theStan Seibert
global import of RootReader in the chroma.camera module. Bringing in ROOT adds 14 seconds to the import time, which really sucks if you don't need it.
2021-05-09Break out the construction of the geometry mesh into a separate flatten() ↵Stan Seibert
function. This makes it easier to experiment with external BVH construction implementations, which only need the flat list of triangles.
2021-05-09Allow make_gpu_struct to pack structs containing the CUDA vector types.Stan Seibert
2021-05-09Put time PDF back into likelihood calculationStan Seibert
2021-05-09Reduce number of scattering passes by default in likelihood calculationStan Seibert
2021-05-09Put back accumulation of time PDF information in PDF calculatorStan Seibert
2021-05-09Fix bug in stepping through photons during multi-DAQ simulation.Stan Seibert
2021-05-09Lower the weight threshold cutoff to propagate even more improbable photonsStan Seibert