Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

affine gap penalty #3

Open
chchch opened this issue Oct 21, 2021 · 2 comments
Open

affine gap penalty #3

chchch opened this issue Oct 21, 2021 · 2 comments

Comments

@chchch
Copy link

chchch commented Oct 21, 2021

Hi,

I've been trying to add support for an affine gap penalty, but I'm finding it difficult to reason through the code. Essentially, I need to remember a_gap and b_gap for each subsequent step so that I can apply a "gap opening penalty".

PS I also noticed that you cite Chin, Ho, Lam, Wong and Chan (2003), but that paper introduces a constrained sequence which doesn't seem to be used in this algorithm — is that right?

Thanks!

@robinp
Copy link
Owner

robinp commented Oct 21, 2021

Hello - just had a quick look on https://web.stanford.edu/class/cs262/presentations/lecture2.pdf about affine gaps. It suggests that one needs to account the best score for a given (i,j) pair separately for the cases if the gap is already open or not.

I can imagine that would mean adding a third, binary dimension to the coordinates, like (i, j, Open|NotOpen), and then when moving horizontally/verticall, you move from (i, j, NotOpen) to either (i-1, j, Open). When moving diagonally, you move from (i, j, _) to (i-1, j-1, NotOpen).

Just a hunch. Not sure about the reference, please excuse me. Also, the code indeed could have more type signatures / guidance.

@chchch
Copy link
Author

chchch commented Oct 26, 2021

Hi,

It seems that you actually need to save three scores at each step, rather than just the max score. Here's the algorithm from Gusfield 1997 Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 244:
image

This is my (pretty ugly) attempt at implementing it... I'm not so good with monads and do-notation:

Trace a b `tappend'` Trace  y (z:_) = Trace (a+y) (z:b)

data AlignConfig' a s = AlignConfig'
  { ac'PairScore :: a -> a -> s
  , ac'_initial_gap_penalty :: s
  , ac'_gap_penalty :: s
  , ac'_gap_opening_penalty :: s
  }

alignConfig' :: (a -> a -> s)  -- ^ Scoring function.
            -> s               -- ^ Initial gap score.
            -> s               -- ^ Gap score.
            -> s               -- ^ Gap opening score.
            -> AlignConfig' a s
alignConfig' = AlignConfig'

affineAlign :: (G.Vector v a, Num s, Ord s)
  => AlignConfig' a s
  -> v a  -- ^ Left sequence.
  -> v a  -- ^ Right sequence.
  -> Trace a s
affineAlign AlignConfig'{..} as bs =
  let p = (lastIndex as, lastIndex bs)
  in revTrace . fst $ evalState (go p) M.empty
  where
  revTrace (Trace s t) = Trace s (reverse t)
  lastIndex v = G.length v - 1
  --
  go p = do
    res <- gets $ M.lookup p
    case res of
        Just r -> return r
        Nothing -> do
            newRes <- pgo p
            modify (M.insert p newRes)
            return newRes
  --
  pgo (i,j)
    | i == (-1) || j == (-1) = return $
      if i == j then (Trace 0 [],(Trace 0 [],Trace 0 []))
      else if i == (-1)
           then skipInit j stepRight bs
           else skipInit i stepLeft as
    | otherwise = do
      let a = as G.! i
          b = bs G.! j
      diag  <- go (i-1,j-1)
      let diag_max = (fst diag) `tappend'` Trace (ac'PairScore a b) [stepBoth a b]
      
      a_gaps <- go (i-1,  j)
      let a_gap1 = (fst a_gaps) `tappend'` Trace (ac'_gap_opening_penalty + ac'_gap_penalty) [stepLeft a]
      let a_gap2 = (fst . snd) a_gaps `tappend'` Trace ac'_gap_penalty [stepLeft a]
      let a_gap_max = L.maximumBy (comparing traceScore) [a_gap1, a_gap2]
      
      b_gaps <- go (  i,j-1)
      let b_gap1 = (fst b_gaps) `tappend'` Trace (ac'_gap_opening_penalty + ac'_gap_penalty) [stepRight b]
      let b_gap2 = (snd . snd) b_gaps `tappend'` Trace ac'_gap_penalty [stepRight b]
      let b_gap_max = L.maximumBy (comparing traceScore) [b_gap1, b_gap2]
      
      let max = L.maximumBy (comparing traceScore) [diag_max, a_gap_max, b_gap_max]
      return (max,(a_gap_max,b_gap_max))
  --
  skipInit idx stepFun xs =
    let score = ac'_initial_gap_penalty * fromIntegral (idx+1)
        tr = reverse [stepFun (xs G.! xi) | xi <- [0..idx]]
    in (Trace score tr,(Trace score tr,Trace score tr))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants