So now we have our circuit base, let's look at the the error sources in this circuit to try to fix them.
In this part we'll examine the heart of the circuit, shown below. PSRR issues will be left until the next part.

Note : I renumberd the transistors and added Q3. For now, the current source transistors which always see constant current and constant Vce will not be considered. However, Q3 which is part of the lower current source does not have a constant Vce, thus it is included.
This is the impedance the DAC sees. It is about 2 Ohms. This should not be a problem and I don't think it warrants a complex circuit to bring it lower, unless it can be had "for free".
I think we can safely ignore the effect of Q1 Vbe variation on the DAC, which I think is safe. Q1 Vce is fixed by Q2, so the only error coming out of Q1 is the base current which is taken out of the signal path.
Q2 has signal-dependent Vce. Q2 Vbe variation with Ic and Vce is not very important because it ends up on Q1's collector. Again, the only interesting error path is the base current.
Q3 also has signal-dependent Vce. If Q3 is part of the lower current source, this will be a problem, thus Q3 will be a cascode for the lower current source, which makes its base current the only error path.
So, the three transistors base currents are likely to be the dominant error source, not considering the power supply for now.
So, what determines base currents ?
- Ic and hFe : hfe varies with Ic, vBe, temperature, and transistor model and manufacturer.
- parasitic capacitances, they are nonlinear and also vary with operating parameters.
See this page for more info on transistor selection. I will have to pore over a lot of datasheets, or so it seems.

Upper curve indicates the linearity of Vout versus input current. We want a straight horizontal line. Middle curve shows the participation of each base current to the total error. The main culprit is obviously Q2. Maybe it wasn't a good idea to use this transistor with a bias current right where it the knee of the hFe/Ic curve starts.
So, I duplicated the circuit (original traces are in purple in the graph below) and replaced Q2 with a BC807, a high-current high-beta low power SMD transistor. This change costs nothing and the circuit linearity get a bit better. So, a single transistor change can really change a circuit, especially when it works open-loop like this one.

We can probably get better than this. Stepping Rout shows the culprit is the Vce variations on the output transistors. With a smaller output resistor, we get a lot better linearity. However, a high output voltage is necessary to get a good signal/noise ratio, so we have a problem.
A solution could be to replace the two cascodes by CFP's :

From the simulation results below, this gives a huge boost in linearity. However CFPs really like to oscillate ; I have already tried this schematic and it did. So, take this with a grain of salt. Simulation shows a nice peak at 100 MHz.
Pinkish colours are the last two simulations.

So, let's replace Q2 and Q3 by good old trusty darlingtons. The simulation results are about the same (except for the oscillations). It's good. So far this is the best schematic.

There is also this option. While having a visually pleasing symmetry and possible distortion cancellation, it has a tiny problem : thermal runaway of the input transistors. Therefore, it will need a Vbe multiplier and all the gimmicks of a power amplifier output stage, much increased complexity with dubious sonic benefits. Besides, distortion cancellation is only real if the currents in both transistors are perfectly balanced, which is, in practice, impossible to achieve.

This is reminiscent of the input stage from a current feedback amplifier. If an opamp should be used, maybe this type can yield better results.
