[PTB90] Paul T. Boggs and Janet E. Rogers. Orthogonal Distance Regression. In P.J. Brown and Wayne A. Fuller, editor, Contemporary Mathematics. American Mathematical Society, Providence, Rhode Island, 1990.
Given a model $f$ so that
\[ y_i = f(x_i; \boldsymbol{\beta} ) \]
where we have $n$ measurements with errors $(\delta_i; \epsilon_i)$, $i=0 \ldots n-1$ and $p$ parameters $\boldsymbol{\beta}$. The measurements are related to the true values by
$X_i = x_i - \delta_i$
$Y_i = y_i - \epsilon_i$
The model can also be written as
\[Y_i = f(X_i + \delta_i; \boldsymbol{\beta}) - \epsilon_i.\]
The function f can be either linear or non-linear in its arguments, but must be a smooth function of these arguments.
For each point ${(X_i, Y_i)}$, we would like to minimise the distance to the curve $f(x; \boldsymbol{\beta})$, i.e.
\[ \text{argmin}_{\epsilon_i, \delta_i} \left\{\epsilon_i^2 + \delta_i^2\right\} \quad \text{subject to} \quad Y_i = f(X_i + \delta_i; \boldsymbol{\beta}) - \epsilon_i. \]
It is assumed that $\epsilon_i \sim \mathcal{N}(0, \sigma_{\epsilon_i^2})$ and $\delta_i \sim \mathcal{N}(0, \sigma_{\delta_i^2})$ (zero mean), from which it follows that, overall, we should minimise the sum of the squares:
\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}, \boldsymbol{\epsilon}} \sum_{i=0}^{n-1}{\epsilon_i^2 + \delta_i^2} \quad \text{subject to}\quad Y_i = f(X_i + \delta_i; \boldsymbol{\beta}) - \epsilon_i \]
which can be rewritten without $\boldsymbol{\epsilon}$ as
\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}} \sum_{i=0}^{n-1} [f(X_i + \delta_i; \boldsymbol{\beta}) - Y_i] + \delta_i^2. \]
By associating weights with the above equation, the weighted Orthogonal Distance Regression problem is defined as
\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}} \sum_{i=0}^{n-1} w_i^2 [f(X_i + \delta_i; \boldsymbol{\beta}) - Y_i] + d_i^2 \delta_i^2, \]
where common choices for $w_i$ and $d_i$ are
\[ w_i = \frac{1}{\sigma_{\epsilon_i}} \ \text{and}\ d_i = \frac{\sigma_{\epsilon_i}}{\sigma_{\delta_i}} \]
which reduces the minimisation problem to
\[ \text{argmin}_{\boldsymbol{\beta}, \boldsymbol{\delta}} \sum_{i=0}^{n-1} \frac{1}{\sigma_{\epsilon_i}} \left\{ f(X_i + \delta_i; \boldsymbol{\beta}) - Y_i + \frac{1}{\sigma_{\delta_i^2}} \delta_i^2 \right\}. \]
The article claims that the algorithm performs better than OLS even when the ratios $d_i$ are only known to a factor of 10. The algorithms utilises a Levenberg-Marquardt like step with trust-region. The implementation makes use of special structures in different matrices to yield a solution in $O(np^2)$ time.