To achieve high scale / high performance in the CGN case, implement algorithmic mapping.
Where each inside user is statically mapped to a set of outside ports.
10.0.0.1 -> 1.1.1.1:<1024-2048>
The inside port space can be compressed into the outside portspace with a simple modulo function. Support endpoint dependent mapping to deal with overloading the outside ports.
Figure out what to do on collisions. I.e. outside port / remote destination port exists for another session.