Eric Fellheimer and Mark Rudner
PSiQCoPATH is written in C++ with the MPI library of communication directives to control inter-processor communication. Throughout the design process, we tried to maintain the robustness and adaptability of our code through the techniques of object oriented programming (OOP). To this end, we invested a significant amount of time to the design of proper abstractions in our system. The subject of object oriented programming in the development of PSiQCoPATH will be discussed further in section .
The input file format for PSiQCoPATH is simple and straightforward. Whitespace is used as the delimiter between entries. Complex numbers such as are supplied as ordered pairs of their real and imaginary parts in parentheses, i.e.
.
Although there may be more efficient ways to store the input data, taking the most straightforward approach left us more time to concentrate on the algorithms and parallelization of our code. Furthermore, the simple whitespace delimited text file approach should not lead to any problems if PSiQCoPATH is ported across platforms. Figure is an example of an input file for a very simple test run.
The entry of the first line is used to set the value of the isAlpha flag in the code. The value of this flag determines whether the program will run in Hamiltonian evolution mode or quantum circuit mode. In this example, the input ``alpha'' is used to set isAlpha to 1 and hence to instruct PSiQCoPATH to run in Hamiltonian evolution mode. In this case, the rest of the input is a used to set up the parameters of the desired time-dependent Hamiltonian.
If ``qga'' is instead specified, the program will run in quantum circuit mode. In this case, the remainder of the input file contains a series of unitary quantum gates to be applied to the system.
The next three lines specify the size of the Hilbert space, the number of terms in the parametric expansion of the Hamiltonian, and , the number of time steps to be calculated. In the example, the Hilbert space has
dimensions, the problem is parametrized by
hamiltonian, and
time steps are to be run.
In the line that follows, the user specifies the values of at each of the
time steps. Note that this allows the possibility of non-uniform time steps. In the example, however, the time steps are equally sized.
The next three lines are used to specify the values of ,
, and
for the first basis Hamiltonian at each time step. In the example
, and
.
Next, the matrix representation of the first ``basis'' Hamiltonian is supplied. In the example, the two-dimensional identity matrix is used. In the general case where more than one basis Hamiltonian is required, the coefficients and basis Hamiltonians of the remaining pieces of the total Hamiltonian supplied next. The format for each piece is the same - a list of values for the coefficients and their derivatives followed by the matrix representation of the basis Hamiltonian.
The final few lines determine the parameters of the output. First, an output filename is specified. Next, the starting point of the output and frequency at which to store the state of the system are given. In the example, the code is instructed to start the output at the first time step, and only save one state vector.
Finally, the value of the outputType flag is set. The value ``evolution'' is used to instruct the code to perform a full time evolution matrix calculation. Alternatively, the value ``state'' is used to instruct the code to perform a single state evolution. In this case, the code will then read in the column vector corresponding to the desired initial state, the vector
in this example.
Figure is a very simple example. For more realistic simulations of larger systems over longer time intervals, it was necessary to write additional scripts to generate the input files automatically. Examples of such scripts are grover.m (written in Matlab) and genAdQCInFile.cpp (written in C++). These scripts are included in the source code archive available on the PSiQCoPATH website.
In Hamiltonian evolution mode, three output files are created. In the first file, the calculated states or matrices at the desired number of output checkpoints are written. A coefficient file is also created, containing the values of ,
,
, and
for each time step. The last file contains the relevant parameters of the calculation and total running time.
Expression () shows the explicit form of the incremental evolution operators in terms of the Hamiltonian and its derivatives. In general to any order, this expression involving products of
and itself/its derivatives. On the surface, this seems to indicate that calculating the incremental evolution operators is an expensive operation requiring many matrix multiplication operations, each scaling as
. However, expansion (
) makes it possible to push the
cost of the matrix multiplication operations needed to calculate the incremental evolution operators for the entire run into a single group of matrix multiplications at the beginning of the computation. The number of matrix multiplications at this step depends on the order of the calculation and the number of basis Hamiltonians employed. Under most circumstances, only a few basis Hamiltonians are required and this step comes with minimal computational cost compared to the rest of the calculation.
The advantage comes from the observation that, up to second order for example,
is just a linear combination of the static basis Hamiltonians
and their binary products
. The coefficients of this linear combination are just combinations of the (real) scalars
. Thus as the first ``pre-computation'' step of the calculation, PSiQCoPATH calculates the set of matrices
In our code, the simulation is divided into contiguous blocks of time of roughly equal size, where
is the number of processors in use. Each processor generates and stores the incremental time evolution operators for all time steps within its allotted block of time.
Once the incremental time step evolution operators have been calculated, the final step is to combine them through matrix multiplication into the sequence (). This is a straightforward application of the parallel prefix algorithm with matrix multiplication playing the role of the associative binary operator. Because matrix multiplication is non-commutative, it is critical to maintain proper operator ordering throughout the calculation.
In general, the number of time steps is much larger than the number of processors available. In this case, the parallel prefix algorithm begins with each processor performing a serial scan operation on its own block of data. Once these local scans are complete, the standard parallel prefix binary tree ascent/descent steps are performed on the top-most (latest time) elements of each processor's data. Finally, each processor other than the root performs a second serial update of its data by right-multiplying each of its own matrices by the top-most element of its earlier time neighbor.
When the number of time steps is much larger than the number of processors, the local serial scan steps dominate the running time to give
scaling, where
is the number of time steps,
is the dimension of the system, and
is the number of processors in use. Using an improvement discussed in the section on future work, the prefactor
can be reduced to
, where
is the ratio of output steps requested by the user divided by the total number of time steps. Typically, this number is much less than
, leading to a nearly 2-fold speedup.
Using the method described above, we can calculate the incremental time evolution operators
to any desired order. The initial state can then be evolved by successively applying the incremental evolution operators to it. In terms of linear algebra, this is simply the problem of calculating:
,
,
, ...,
where
is an
column vector and each
is an
unitary matrix.
Rather than using serial/parallel scan operations to first compute all of the matrix products and then multiply by the initial state vector to get the final result, a method that scales as
, the calculation can be performed using exclusively matrix-vector multiplication operations that scale as
. That is, the calculation starts by computing
. This result can the be used to calculate
=
, and so on.
Because of its -fold better scaling properies, we used a data-distributed version of this latter technique of successive matrix vector multiplication to parallelize the calculation of single state evolution. Let
be the the number of processing units and
be the dimension of the Hilbert Space. Let
be the number of matrices in the operation, i.e. the length of the simulation. The algorithm is quite simple:
The running time associated with this algorithm will be:
Thus, if is large, then
is dominated by the matrix vector multiplications. In this case, the running time is close to:
![]() |
(4) |
This case is perhaps too simple to use as a test case, though, as the Hamiltonian is independent of time. The second test was to have PSiQCoPATH simulate the behavior of a spin-1/2 magnetic moment in a time-varying magnetic field. The magnetic field varied in time according to the equation
, where
and
are unit vectors in the
and
directions, respectively. The Hamiltonian for this system is
![]() |
(5) |
![]() |
(6) |
In the case where
, the motion is very nearly adiabatic. That is, the spin direction very closely follows the field direction. As
increases, the trajectory acquires increasingly large cycloid-like wiggles. Although this behavior was not expected beforehand, in retrospect it is easy to understand.
[height=2in]cone.jpg
|
The key to this understanding is a mapping that we discovered between this problem and the trajectory of a point on the rim of a cone rolling on a flat surface (see figure ). This mapping comes from the fact that a magnetic field generates rotations about its direction with frequency
. In our case, this rotation axis is itself rotating with frequency
in the
-plane.
The rolling motion of a cone on flat surface consists of two combined rotations - the cone rotates about its own symmetry axis with frequency and about the vertical axis through its tip with frequency
. Under the condition of rolling without slippage, the combined effect of these two angular velocities is a net angular velocity along the line of contact between the cone and the surface. As the cone rolls, the direction of this instantaneous axis of rotation rotates about the vertical direction with frequency
.
Thus the instantaneous axis of rotation in the cone problem has exactly the same behavior as the instantaneous axis of rotation (the magnetic field) in our spin problem. At a given instant, all points on the cone are rotating about the instantaneous axis of rotation, just as at any given instant the spin is precessing about the instantaneous direction of the magnetic field.
The half-angle of the cone corresponding to a particular choice of and
in the quantum spin problem can be found by simple trigonometry, and is given by the relation
![]() |
(7) |
Aside from the intellectual interest of this result, it also turns out to be one of the rare cases of a Hamiltonian with non-trivial time dependence for which we have an exact answer to compare with the simulation. The agreement with our results appears to be quite good, though we have not explored it in quantitative detail.
Once this method validation was complete, we used PSiQCoPATH to simulate the solution of a few instances of NP-Complete problems using the method of quantum computation by adiabatic evolution described in the introduction and the reference given there. The particular problem for which we had easy access the proper Hamiltonians was the so-called exact cover problem. Exact cover is a version of satisfiability involving 3 bit clauses of the form
![]() |
(8) |
This problem is described in detail in the paper by Farhi et al. As in that paper, we use a linear interpolation between the initial and final Hamiltonians of the form
![]() |
(9) |
![]() |
(10) |
We would like to thank Daniel Nagaj for supplying us with these Hamiltonians. An example of our results for a 6 qubit instance of Exact Cover are shown in figure .
[height=2.5in]EC6.jpg
|
These plots were generated by a Matlab script written by us to parse the PSiQCoPATH output files and perform the desired analysis. The script diagonalizes the system's Hamiltonian at each output time step and transforms the evolved state at the corresponding time step into this eigenbasis. The eigenstate population is equal to the square magnitude of the
component of the evolved state in the instantaneous eigenbasis. For
, we see that the probability of finishing in the ground state, i.e. of obtaining the correct solution to the problem, is approximately 60%. When
, this probability is very nearly 1.
In figure , the energy levels (eigenvalues) of the instantaneous Hamiltonian are plotted over the course of the evolution. Notice that the position of the minimum energy gap is precisely where the ground state population gets ``lost'' in the fast run. This is what is expected from the considerations of the adiabatic theorem, and makes for a nice confirmation of the theory.
In the end we were only able to test up to 8 qubits. This is really not enough to make progress over the current state of the art in research on this topic, but with the improvements described in the section on future work we should be able to scale up to much higher dimension. All results were obtained from full time evolution operator calculations. Although this calculation is in a sense overkill for what we have used it for in the analysis, the full time evolution matrix could be used to find the success probability in the case where the initial state is actually a ``mixed-state'' due to thermal noise and/or uncertain preparation. This is an interesting situation to look at from a practical point of in view as this is a more realistic picture of what the situation will be like in real physical implementations.
With the distributed data approach of the matrix-vector multiplication single state evolution algorithm, we should be able to reach even larger systems. Memory usage is significantly lower in that case, and the distribution across processors should allow us to handle much larger matrices without having to reach beyond the cache/fast memory. This code did not become operational until after the tests described, so we only have detailed results for the full time evolution operator calculations.
Throughout these runs we also kept track of PSiQCoPATH's performance in terms of running time. Running time as a function of simulation length and number of processors is plotted in figure . The trends are very nearly linear in both
and
, confirming our projected scaling rules.
We recently realized that it is not necessary to ever have all of the incremental time evolution operators stored at one time. In general, the number of output timesteps requested by the user is much less than the number of actual timesteps performed in the evolution (by a factor of perhaps 1000). Rather than storing all of the incremental operators, all we really need is that fraction of them corresponding to the much coarser output time step.
A much more efficent procedure would be to partially combine the first two steps in the following way. Let be ratio of the total number of time steps to the number of output steps requested. Instead of storing every
, we really only need to store each
=
. We can build up
by multiplying by each successive incremental time evolution operator as it is generated. Once
time steps have been calculated and combined, that
can be stored in memory and the next one started. In this way, the memory requirements of the program will be cut roughly by a factor of
.
As an additional benefit, the final local update step will also be shortened by a factor of
. Overall this leads to a speedup of approximately
in the projected running time.
Also, we did not fully explore alternative basic arithmetic algorithms that could speedup the system. For instance, Strassen's algorithm for matrix multiplication runs asymptotically faster than . However, this algorithm has significantly different memory usage. Also, its overhead is much larger than standard matrix multiply. Thus, simply using Strassen's algorithm would not necessarily be an improvement. Possible changes of this sort are worth considering, and can easily be integrated and tested in our code due to its object-based construction.
Throughout the course on parallel computing, most of the skeletal code we were given was not object-oriented. We made it a point to make our code as object-oriented as possible. Over the course of the project we found that many aspects of object oriented programming carry over directly to the parallel setting, but we also encountered some new challenges unique to the playing field of parallelism.
At a high level, the big advantage of object-oriented programming is the power of abstraction. We employed such abstraction with objects that we knew would be parallelized. For instance, in the ComplexMatrix class, we have methods such as send, receive, and rowDist to communicate matrix objects between processors.
The send and receive methods were relatively simple and worked well. These routines simply send entire matrices to/from other processors via the MPI. An alternative approach would have been to define a new MPI datatype for objects of the ComplexMatrix class, but we found it much simpler to simply add these communication methods in the class itself.
The rowDist method was somewhat awkward. Its goal was to distribute the data of a matrix stored completely on a single processor to all other processors in row-wise fashion. Similar to an MPI_Bcast command, every processor calls rowDist. Before calling this command, however, each processor had to determine how many rows it would store. This added an additional step to the method, but was not an insurmountable challenge.
Overall, we found using objected oriented techniques to be very useful in maintaining abstractions in an MPI-driven parallel program. However, classes should be designed with paralellism in mind to achieve maximum robustness. It is not always so easy to simply sprinkle in some MPI routines into a class after it has already been designed.
Due to the growing importance of quantum systems to the information technology industry as well as their intrinsic scientific interest, it is important to be able to simulate quantum dynamics as accurately and efficiently as possible. We have developed PSiQCoPATH in an attempt to apply the power of parallel computing to the accurate simulation of a very wide class of quantum problems. Validation was performed on several test systems and showed excellent agreement with analytical predictions.
While there are certainly issues we could have considered in greater detail, this project has been a huge success as a proof of concept. Quantum systems can indeed be simulated on parallel machines efficiently as a means for learning more about the quantum mechanics of various physical systems.