diff --git a/RADAR2TEI/TestData/abs.tei b/RADAR2TEI/TestData/abs.tei new file mode 100644 index 0000000..8b736dd --- /dev/null +++ b/RADAR2TEI/TestData/abs.tei @@ -0,0 +1,3192 @@ + + + + + + + + + Title + + +

Publication Information

+
+ +

Information about the source

+
+
+ + + + A2.5. - Software engineering + A3.3.2. - Data mining + A3.4.1. - Supervised learning + A3.4.2. - Unsupervised learning + A6.1.4. - Multiscale modeling + A6.2.4. - Statistical methods + A6.2.8. - Computational geometry and meshes + A8.1. - Discrete mathematics, combinatorics + A8.3. - Geometry, Topology + A8.7. - Graph theory + A9.2. - Machine learning + + + B1.1.1. - Structural biology + B1.1.5. - Immunology + B1.1.7. - Bioinformatics + + + +
+ + +
+ + + + Frédéric + Cazals + + Chercheur + + Sophia + + Team leader, Inria, Senior Researcher + hdr + + + + Dorian + Mazauric + + Chercheur + + Sophia + + Inria, Researcher + hdr + + + + Edoardo + Sarti + + Chercheur + + Sophia + + Inria, Researcher, from Oct 2021 + + + + Vladimir + Krajnak + + PostDoc + + Sophia + + Inria, from Oct 2021 + + + + Timothee + O Donnell + + PhD + + Sophia + + Inria + + + + Louis + Goldenberg + + Stagiaire + + Sophia + + Inria, from Mar 2021 until Aug 2021 + + + + Aarushi + Gupta + + Stagiaire + + Sophia + + Inria, from May 2021 until Aug 2021 + + + + Valentin + Madelaine + + Stagiaire + + Sophia + + Inria, until Mar 2021 + + + + Maximilien + Martin + + Stagiaire + + Sophia + + École Normale Supérieure de Lyon, from Oct 2021 + + + + Florence + Barbara + + Assistant + + Sophia + + Inria + + + + Charles + Robert + + CollaborateurExterieur + + Sophia + + CNRS + hdr + + + + Konstantin + Roeder + + CollaborateurExterieur + + Sophia + + Robinson College - Cambridge + + +
+
+ + + ABS + Algorithms - Biology - Structure + + Computational Biology + Digital Health, Biology and Earth + http://team.inria.fr/abs + Creation of the Project-Team: 2008 July 01 + + A2.5. - Software engineering + A3.3.2. - Data mining + A3.4.1. - Supervised learning + A3.4.2. - Unsupervised learning + A6.1.4. - Multiscale modeling + A6.2.4. - Statistical methods + A6.2.8. - Computational geometry and meshes + A8.1. - Discrete mathematics, combinatorics + A8.3. - Geometry, Topology + A8.7. - Graph theory + A9.2. - Machine learning + + + B1.1.1. - Structural biology + B1.1.5. - Immunology + B1.1.7. - Bioinformatics + + + + + + + +
+
+ +
+ Overall objectives +
+ Biomolecules and their function(s). +

 Computational Structural Biology (CSB) is the scientific domain concerned with the development of algorithms + and software to understand and predict the structure and function of biological macromolecules. This research + field is inherently multi-disciplinary. On the experimental side, biology and medicine provide the objects + studied, while biophysics and bioinformatics supply experimental data, which are of two main kinds. On the one + hand, genome sequencing projects give supply protein sequences, and ~200 millions of sequences have been + archived in UniProtKB/TrEMBL – which collects the protein sequences + yielded by genome sequencing projects. On the other hand, structure determination experiments (notably X-ray + crystallography, nuclear magnetic resonance, and cryo-electron microscopy) give access to geometric models of + molecules – atomic coordinates. Alas, only ~150,000 structures have been solved and deposited in the Protein + Data Bank (PDB), a number to be compared against the + + + + + 10 + 8 + + + + sequences found in UniProtKB/TrEMBL. With one structure for ~1000 sequences, we hardly know anything + about biological functions at the atomic/structural level. Complementing experiments, physical + chemistry/chemical physics supply the required models (energies, thermodynamics, etc). More specifically, let us + recall that proteins with + + n + + atoms has + + + d + = + 3 + n + + + Cartesian coordinates, and fixing + these (up to rigid motions) defines a conformation. As conveyed by the iconic lock-and-key + metaphor for interacting molecules, Biology is based on the interactions stable conformations make with each + other. Turning these intuitive notions into quantitative ones requires delving into statistical physics, as + macroscopic properties are average properties computed over ensembles of conformations. Developing effective + algorithms to perform accurate simulations is especially challenging for two main reasons. The first one is the + high dimension of conformational spaces – see + + + d + = + 3 + n + + + above, typically several tens of + thousands, and the non linearity of the energy functionals used. The second one is the multiscale nature of the + phenomena studied: with biologically relevant time scales beyond the millisecond, and atomic vibrations periods + of the order of femto-seconds, simulating such phenomena typically requires + + + + + 10 + 12 + + + + conformations/frames, a (brute) + tour de force rarely achieved  .

+

+

+
+ Computational Structural Biology: three main challenges. +

 The first challenge, sequence-to-structure prediction, aims to infer the possible + structure(s) of a protein from its amino acid sequence. While recent progress has been made recently using in + particular deep learning techniques , the models obtained so far + are static and coarse-grained.

+

The second one is protein function prediction. Given a protein with known structure i.e. 3D coordinates, the goal is to predict the partners of this protein, in terms of stability + and specificity. This understanding is fundamental to biology and medicine, as illustrated by the example of the + SARS-CoV-2 virus responsible of the Covid19 pandemic. To infect a host, the virus first fuses its envelope with + the membrane of a target cell, and then injects its genetic material into that cell. Fusion is achieved by a + so-called class I fusion protein, also found in other viruses (influenza, SARS-CoV-1, HIV, etc). The fusion + process is a highly dynamic process involving large amplitude conformational changes of the molecules. It is + poorly understood, which hinders our ability to design therapeutics to block it.

+
+ Figure 1: The synergy modeling - experiments, and challenges faced in CSB: illustration + on the problem of designing miniproteins blocking the entry of SARS-CoV-2 into cells. From . Of note: the first step of the infection by + SARS-CoV-2 is the attachment of its receptor binding domain of its spike (RBD, blue molecule), to a target + protein found on the membrane of our cells, ACE2 (orange molecule). A strategy to block infection is therefore + to engineer a molecule binding the RBD, preventing its attachment to ACE2. (A) Design of + a helical protein (orange) mimicking a region of the ACE2 protein. (B) Assessment of + binding modes (conformation, binding energies) of candidate miniproteins neutralizing the RBD. +
+

Finally, the third one, large assembly reconstruction, aims at solving (coarse-grain) + structures of molecular machines involving tens or even hundreds of subunits. This research vein was promoted + about 15 years back by the work on the nuclear pore complex . It is often + referred to as reconstruction by data integration, as it necessitates to combine coarse-grain + models (notably from cryo-electron microscopy (cryo-EM) and native mass spectrometry) with atomic models of + subunits obtained from X ray crystallography. Fitting the latter into the former requires exploring the + conformation space of subunits, whence the importance of protein dynamics.

+

As an illustration of these three challenges, consider the problem of designing proteins blocking the entry of + SARS-CoV-2 into our cells (Fig. ). The first challenge is illustrated + by the problem of predicting the structure of a blocker protein from its sequence of amino-acids – a tractable + problem here since the mini proteins used only comprise of the order of 50 amino-acids (Fig. (A), ). The second + challenge is illustrated by the calculation of the binding modes and the binding affinity of the designed + proteins for the RBD of SARS-CoV-2 (Fig. (B)). Finally, the last challenge is + illustrated by the problem of solving structures of the virus with a cell, to understand how many spikes are + involved in the fusion mechanism leading to infection. In , the promising + designs suggested by modeling have been assessed by an array of wet lab experiments (affinity measurements, + circular dichroism for thermal stability assessment, structure resolution by cryo-EM). The hyperstable minibinders identified provide starting points for SARS-CoV-2 therapeutics  . We note in passing that this is truly remarkable work, yet, + the designed proteins stem from a template (the bottom helix from ACE2), and are rather + small.

+
+ Figure 2: The main challenges of molecular simulation: Finding significant local minima of the energy + landscape, computing statistical weights of catchment basins by integrating Boltzmann's factor, and + identifying transitions. Practically, + + + d + > + 100 + + + . +
+
+
+ Protein dynamics: core CS - maths challenges. +

To present challenges in structural modeling, let us recall the following ingredients. First, a molecular model + with + + n + + atoms + is parameterized over a conformational space + + 𝒳 + + of dimension + + + d + = + 3 + n + + + in Cartesian coordinates, or + + + + d + = + 3 + n + - + 6 + + + in internal + coordinate–upon removing rigid motions, also called degree of freedom (d.o.f.). Second, + recall that the potential energy landscape (PEL) is the mapping + + + V + ( + · + ) + + + from + + + + d + + + to + + + + providing a + potential energy for each conformation  , . Example potential energies (PE) are CHARMM, AMBER, MARTINI, etc. Such PE belong to the realm of + molecular mechanics, and implement atomic or coarse-grain models. They may embark a solvent model, either + explicit or implicit. Their definition requires a significant number of parameters (up to + + + + 1 + , + 000 + + + ), fitted to reproduce + physico-chemical properties of (bio-)molecules  .

+

These PE are usually considered good enough to study non covalent interactions – our focus, even tough they do + not cover the modification of chemical bonds. In any case, we take such a function for granted .

+

The PEL codes all structural, thermodynamic, and kinetic properties, + which can be obtained by averaging properties of conformations over so-called thermodynamic + ensembles. The structure of a macromolecular system requires the characterization of + active conformations and important intermediates in functional pathways involving significant basins. In + assigning occupation probabilities to these conformations by integrating Boltzmann's distribution, one treats + thermodynamics. Finally, transitions between the states, modeled, say, by a master + equation (a continuous-time Markov process), correspond to kinetics. Classical simulation + methods based on molecular dynamics (MD) and Monte Carlo sampling (MC) are developed in the lineage of the + seminal work by the 2013 recipients of the Nobel prize in chemistry (Karplus, Levitt, Warshel), which was + awarded “for the development of multiscale models for complex chemical systems”. However, + except for highly specialized cases where massive calculations have been used , neither MD nor MC give access to the aforementioned time + scales. In fact, the main limitation of such methods is that they treat structural, thermodynamic and kinetic + aspects at once . The absence of specific insights on these three + complementary pieces of the puzzle makes it impossible to optimize simulation methods, and results in general in + the inability to obtain converged simulations on biologically relevant time-scales.

+

The hardness of structural modeling owes to three intertwined reasons.

+

First, PELs of biomolecules usually exhibit a number of critical points exponential in the dimension  ; fortunately, they enjoy a multi-scale structure  . Intuitively, the significant local minima/basins are those + which are deep or isolated/wide, two notions which are mathematically + qualified by the concepts of persistence and prominence. Mathematically, problems are plagued with the curse of + dimensionality and measure concentration phenomena. Second, biomolecular processes are inherently multi-scale, + with motions spanning + + + + 15 and + + + + 4 orders of magnitude in time and amplitude respectively . Developing methods able to exploit this multi-scale + structure has remained elusive. Third, macroscopic properties of biomolecules i.e. + observables, are average properties computed over ensembles of conformations, which calls for a multi-scale + statistical treatment both of thermodynamics and kinetics.

+
+
+ Validating models. +

A natural and critical question naturally concerns the validation of models proposed in structural + bioinformatics. For all three types of questions of interest (structures, thermodynamics, kinetics), there exist + experiments to which the models must be confronted – when the experiments can be conducted.

+

For structures, the models proposed can readily be compared against experimental results stemming from X ray + crystallography, NMR, or cryo electron microscopy. For thermodynamics, which we illustrate here with binding + affinities, predictions can be compared against measurements provided by calorimetry or surface plasmon + resonance. Lastly, kinetic predictions can also be assessed by various experiments such as binding affinity + measurements (for the prediction of + + + K + + o + n + + + + and + + + K + + o + f + f + + + + ), or fluorescence + based methods (for kinetics of folding).

+
+
+
+ Research program +

Our research program ambition to develop a comprehensive set of novel concepts and algorithms to study protein + dynamics, based on the modular framework of PEL.

+
+ Modeling the dynamics of proteins + + + Molecular conformations + + + conformational exploration + + + energy landscapes + + + thermodynamics + + + kinetics. + + +

As noticed while discussing Protein dynamics: core CS - maths challenges, the integrated + nature of simulation methods such as MD or MC is such that these methods do not in general give access to + biologically relevant time scales. The framework of energy landscapes , (Fig. ) is much more + modular, yet, large biomolecular systems remain out of reach.

+

To make a definitive step towards solving the prediction of protein dynamics, we will serialize the discovery + and the exploitation of a PEL , , . Ideas and + concepts from computational geometry/geometric motion planning, machine learning, probabilistic algorithms, and + numerical probability will be used to develop two classes of probabilistic algorithms. The first deals with + algorithms to discover/sketch PELs i.e. enumerate all significant (persistent or prominent) + local minima and their connections across saddles, a difficult task since the number of all local + minima/critical points is generally exponential in the dimension. To this end, we will develop a hierarchical + data structure coding PELs as well as multi-scale proposals to explore molecular conformations. (Nb: in Monte + Carlo methods, a proposal generates a new conformation from an existing one.) The second focuses on methods to + exploit/sample PELs i.e. compute so-called densities of states, from which all thermodynamic + quantities are given by standard relations  + . This is a hard problem akin to high-dimensional + numerical integration. To solve this problem, we will develop a learning based strategy for the Wang-Landau + algorithm  –an adaptive Monte Carlo Markov Chain (MCMC) algorithm, as + well as a generalization of multi-phase Monte Carlo methods for convex/polytope volume calculations  , , for + non convex strata of PELs.

+
+
+ Algorithmic foundations: geometry, optimization, machine learning + + + Geometry + + + optimization + + + machine learning + + + randomized algorithms + + + sampling + + + optimization.. + + +

As discussed in the previous Section, the study of PEL and protein dynamics raises difficult algorithmic / + mathematical questions. As an illustration, one may consider our recent work on the comparison of high + dimensional distribution , statistical tests / + two-sample tests , , + the comparison of clustering , the complexity study of + graph inference problems for low-resolution reconstruction of assemblies , the analysis of partition (or clustering) stability + in large networks, the complexity of the representation of simplicial complexes . Making progress on such questions is + fundamental to advance the state-of-the art on protein dynamics.

+

We will continue to work on such questions, motivated by CSB / theoretical biophysics, both in the continuous + (geometric) and discrete settings. The developments will be based on a combination of ideas and concepts from + computational geometry, machine learning (notably on non linear dimensionality reduction, the reconstruction of + cell complexes, and sampling methods), graph algorithms, probabilistic algorithms, optimization, numerical + probability, and also biophysics.

+
+
+ Software: the Structural Bioinformatics Library + + + Scientific software + + + generic programming + + + molecular modeling.. + + +

While our main ambition is to advance the algorithmic foundations of molecular simulation, a major challenge + will be to ensure that the theoretical and algorithmic developments will change the fate of applications, as + illustrated by our case studies. To foster such a symbiotic relationship between theory, algorithms and + simulation, we will pursue high quality software development and integration within the SBL, + and will also take the appropriate measures for the software to be widely adopted.

+
+ Software in structural bioinformatics. +

Software development for structural bioinformatics is especially challenging, combining advanced geometric, + numerical and combinatorial algorithms, with complex biophysical models for PEL and related + thermodynamic/kinetic properties. Specific features of the proteins studied must also be accommodated. About + 50 years after the development of force fields and simulation methods (see the 2013 Nobel prize in chemistry), + the software implementing such methods has a profound impact on molecular science at large. One can indeed + cite packages such as CHARMM, AMBER, gromacs, gmin, MODELLER, Rosetta, VMD, PyMol, .... On the other hand, these packages are goal oriented, each tackling a (small set + of) specific goal(s). In fact, no real modular software design and integration has taken place. As a result, + despite the high quality software packages available, inter-operability between algorithmic building blocks + has remained very limited.

+
+
+ The SBL. +

Predicting the dynamics of large molecular systems requires the integration of advanced algorithmic building + blocks / complex software components. To achieve a sufficient level of integration, we undertook the + development of the Structural Bioinformatics Library (SBL, , a generic C++/python + cross-platform library providing software to solve complex problems in structural bioinformatics. For + end-users, the SBL provides ready to use, state-of-the-art applications to model + macro-molecules and their complexes at various resolutions, and also to store results in perennial and easy to + use data formats (). For developers, the SBL provides a + broad C++/python toolbox with modular design (). This hybrid status targeting both + end-users and developers stems from an advanced software design involving four software components, namely + applications, core algorithms, biophysical models, and modules (). This modular design makes it possible to optimize + robustness and the performance of individual components, which can then be assembled within a goal oriented + application.

+
+
+
+ Applications: modeling interfaces, contacts, and interactions + + + Protein interactions + + + protein complexes + + + structure/thermodynamics/kinetics prediction. + + +

Our methods will be validated on various systems for which flexibility operates at various scales. Example such + systems are antibody-antigen complexes, (viral) polymerases, (membrane) transporters.

+

Even very complex biomolecular systems are deterministic in prescribed conditions (temperature, pH, etc), + demonstrating that despite their high dimensionality, all d.o.f. are not at play at the same + time. This insight suggests three classes of systems of particular interest. The first class consists of systems + defined from (essentially) rigid blocks whose relative positions change thanks to conformational changes of + linkers; a Newton cradle provides an interesting way to envision such as system. We have recently worked on one + such system, a membrane proteins involve in antibiotic resistance (AcrB, see . The second class consists of cases where relative + positions of subdomains do not significantly change, yet, their intrinsic dynamics are significantly altered. A + classical illustration is provided by antibodies, whose binding affinity owes to dynamics localized in six + specific loops , . + The third class, consisting of composite cases, will greatly benefit from insights on the first two classes. As + an example, we may consider the spikes of the SARS-CoV-2 virus, whose function (performing infection) involves + both large amplitude conformational changes and subtle dynamics of the so-called receptor binding domain. We + have started to investigate this system, in collaboration with B. Delmas (INRAe) .

+

In ABS, we will investigate systems in these three tiers, in collaboration with expert collaborators, to + hopefully open new perspectives in biology and medicine. Along the way, we will also collaborate on selected + questions at the interface between CSB and systems biology, as it is now clear that the structural level and the + systems level (pathways of interacting molecules) can benefit from one another.

+
+
+
+ Application domains +

The main application domain is Computational Structural Biology, as underlined in the Research + Program.

+
+
+ Highlights of the year +

In October 2021, Edoardo Sarti has joined ABS as Chargé de Recherche de Classe Normale. His + expertise comprises a diverse set of interests spanning from algorithmic questions about geometrical, functional + and evolutionary aspects of biomolecules (latest study: ), to + the collection and analysis of large collections of molecular structural data. From the very start, E. Sarti has + started taking part in several research and technical projects of ABS.

+
+
+ New software and platforms +

See report on the Structural Bioinformatics Library.

+
+ New software +
+ SBL + + + + + + + + + +
+
+
+
+ New results +
+ Modeling interfaces, contacts, and interactions + + + docking + + + scoring + + + interfaces + + + protein complexes + + + Voronoi diagrams + + + arrangements of balls. + + +
+ Boosting the analysis of protein interfaces with Multiple Interface String Alignments: illustration + on the spikes of coronaviruses + + + + + F. + Cazals + + + + + In collaboration with S. Bereux and B. Delmas (INRAe, Jouy-en-Josas). +

In this work , we introduce Multiple Interface + String Alignment (MISA), a visualization tool to display coherently various sequence and structure + based statistics at protein-protein interfaces (SSE elements, buried surface area, + + + Δ + A + S + A + + + , B factor values, etc). The + amino-acids supporting these annotations are obtained from Voronoi interface models. The benefit of MISA is to + collate annotated sequences of (homologous) chains found in different biological contexts i.e. bound with + different partners or unbound. The aggregated views MISA/SSE, MISA/BSA, MISA/ + + + Δ + A + S + A + + + etc make it trivial to identify + commonalities and differences between chains, to infer key interface residues, and to understand where + conformational changes occur upon binding. As such, they should prove of key relevance for knowledge based + annotations of protein databases such as the Protein Data Bank.

+

Illustrations are provided on the receptor binding domain (RBD) of coronaviruses, in complex with their + cognate partner or (neutralizing) antibodies. MISA computed with a minimal number of structures complement and + enrich findings previously reported.

+

The corresponding package is available from the Structural Bioinformatics Library ( +

+

and ).

+
+
+ SARS-CoV-2 Through the Lens of Computational Biology: How bioinformatics is playing a key role in the + study of the virus and its origins + + + + + F. + Cazals + + + + + In collaboration with Samuel Alizon (MIVEGEC - Maladies infectieuses et vecteurs : écologie, + génétique, évolution et contrôle), Stéphane Guindon (MAB - Méthodes et Algorithmes pour la Bioinformatique, + LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier), Claire Lemaitre + (GenScale - Scalable, Optimized and Parallel Algorithms for Genomics, Inria Rennes – Bretagne Atlantique), + Tristan Mary-Huard (INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et + l’Environnement), Anna Niarakis (Lifeware - Computational systems biology and optimization, Inria Saclay - Ile + de France; GenHotel - Laboratoire de recherche européen pour la polyarthrite rhumatoïde), Mikaël Salson + (CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189), Celine Scornavacca + (UMR ISEM - Institut des Sciences de l'Evolution de Montpellier), Hélène Touzet (CRIStAL - Centre de Recherche + en Informatique, Signal et Automatique de Lille - UMR 9189). +

On December 2019, the Chinese Center for Disease Control reported several cases of severe pneumonia that + resists usual treatments in the city of Wuhan. This was the beginning of the COVID-19 pandemic which caused + more than 80 millions infection cases and 1.7 millions deaths during the year 2020 alone1 . This major + outbreak has given rise to global public health responses as well as an international research effort of + unprecedented scope and speed. This scientific mobilization has led to remarkable results, which have enabled + a great deal of knowledge to be accumulated in just a few months on this novel pathogen: identification of the + virus, of its main proteins, analysis of its origin and its functionning. This basic biological knowledge is + mandatory to medical advances: design tests, find a vaccine or a cure.

+

In this document , one year after the beginning of the worldwide + spread of the disease, we wish to shed particular light on the contribution of bioinformatics in all this + work. Bioinformatics is a discipline at crossroads of computer sciences, mathematics and biology that has + taken on an inestimable importance in modern biology and medicine. It provides computational models, + algorithms and software to the scientific community, that are both operational and effective. The discovery + and study of the SARS-Cov-2 coronavirus is an emblematic example. The utilization of bioinformatics methods + has been at the heart of essential milestones : from the sequencing of the virus genome and its annotation to + the history of its origin, the modelisation of interacting biological entities both at the molecular scale and + at the network scale, and the study of the host genetic susceptibility. All these studies, as a whole, have + made it possible to elucidate the nature and the functionning of the novel pathogen and have greatly + contributed to the fight against COVID-19.

+
+
+ Gene prioritization based on random walks with restarts and absorbing states, to define gene sets + regulating drug pharmacodynamics from single-cell analyses + + + + + F. + Cazals + + + + + D. + Mazauric + + + + + A. + Sales de Queiroz + + + + + G. + Sales Santa Cruz + + + + + In collaboration with Alain Jean-Marie (Inria Neo) and Jérémie Roux (Inserm and CNRS and UCA). +

Prioritizing genes for their role in drug sensitivity, is an important step in understanding drugs mechanisms + of action and discovering new molecular targets for co-treatment. In this work , we formalize this problem by considering + two sets of genes + + X + + and + + P + + respectively composing the predictive gene signature of sensitivity to a + drug and the genes involved in its mechanism of action, as well as a protein interaction network (PPIN) + containing the products of + + X + + and + + P + + as nodes. We introduce Genetrank, a method to prioritize + the genes in + + X + + for their likelihood to regulate the genes in + + P + + .

+

+ Genetrank uses asymmetric random walks with restarts, absorbing states, and a suitable + renormalization scheme. Using novel so-called saturation indices, we show that the conjunction of absorbing + states and renormalization yields an exploration of the PPIN which is much more progressive than that afforded + by random walks with restarts only. Using MINT as underlying network, we apply Genetrank to + a predictive gene signature of cancer cells sensitivity to tumor-necrosis-factor-related apoptosis-inducing + ligand (TRAIL), performed in single-cells. Our ranking provides biological insights on drug sensitivity and a + gene set considerably enriched in genes regulating TRAIL pharmacodynamics when compared to the most + significant differentially expressed genes obtained from a statistical analysis framework alone. We also + introduce gene expression radars, a visualization tool to assess all pairwise interactions + at a glance.

+

+ Genetrank is made available in the Structural Bioinformatics Library (). It should prove useful for mining gene sets in + conjunction with a signaling pathway, whenever other approaches yield relatively large sets of genes.

+
+
+
+ Modeling the dynamics of proteins + + + protein + + + flexibility + + + collective coordinate + + + conformational sampling dimensionality reduction. + + +
+ Tripeptide loop closure: a detailed study of reconstructions based on Ramachandran + distributions + + + + + F. + Cazals + + + + + T. + O'Donnell + + + + + In collaboration with C. Robert (IBPC / CNRS, Paris, France). +

Tripeptide loop closure (TLC) is a standard procedure to reconstruct protein backbone conformations, by + solving a polynomial system in a single variable yielding up to 16 real solutions.

+

In this work , we first show that multiprecision is required in + a TLC solver to guarantee the existence and the accuracy of solutions. We then compare solutions yielded by + the TLC solver against tripeptides from the Protein Data Bank. We show that these solutions are geometrically + diverse (up to + + + 3 + Å + + + RMSD with respect to the data), and sound in terms + of potential energy. Finally, we compare Ramachandran distributions of data and reconstructions for the three + amino acids. The distribution of reconstructions in the second angular space + + + ( + + ϕ + 2 + + , + + ψ + 2 + + ) + + + + stands out, with a rather uniform distribution leaving a central void.

+

We anticipate that these insights, coupled to our robust implementation in the Structural Bioinformatics + Library (), will help understanding the + properties of TLC reconstructions, with potential applications to the generation of conformations of flexible + loops in particular.

+
+
+
+ Algorithmic foundations + + + Computational geometry + + + computational topology + + + optimization + + + data analysis. + + +
+ Frechet mean and p-mean on the unit circle: characterization, decidability, and algorithm + + + + + F. + Cazals + + + + + T. + O'Donnell + + + + + In collaboration with B. Delmas (INRAe, Jouy-en-Josas). +

The center of mass of a point set lying on a manifold generalizes the celebrated Euclidean centroid, and is + ubiquitous in statistical analysis in non Euclidean spaces.

+

In this work , we give a complete characterization of the weighted + + + p + + -mean of + a finite set of angular values on + + + S + 1 + + + , based on a decomposition of + + + S + 1 + + + such that the functional of interest has at most one + local minimum per cell. This characterization is used to show that the problem is decidable for rational + angular values –a consequence of Lindemann's theorem on the transcendence of + + π + + , and to develop an effective + algorithm parameterized by exact predicates. A robust implementation of this algorithm based on + multi-precision interval arithmetic is also presented, and is shown to be effective for large values of + + + n + + and + + + p + + . We use + it as building block to implement the k-means and k-means++ clustering algorithms on the flat torus, with + applications to clustering protein molecular conformations. These algorithms are available in the Structural + Bioinformatics Library ().

+

Our derivations are of interest in two respects. First, efficient + + p + + -mean calculations are relevant to + develop principal components analysis on the flat torus encoding angular spaces–a particularly important case + to describe molecular conformations. Second, our two-stage strategy stresses the interest of combinatorial + methods for p-means, also emphasizing the role of numerical issues.

+
+
+ Improved polytope volume calculations based on Hamiltonian Monte Carlo with boundary reflections and + sweet arithmetics + + + + + F. + Cazals + + + + + A. + Chevallier + + + + + In collaboration with S. Pion IMS (Univ. Bordeaux / Bordeaux INP / CNRS UMR 5218). +

Computing the volume of a high dimensional polytope is a fundamental problem in geometry, also connected to + the calculation of densities of states in statistical physics, and a central building block of such algorithms + is the method used to sample a target probability distribution.

+

This paper studies Hamiltonian Monte Carlo (HMC) with + reflections on the boundary of a domain, providing an enhanced alternative to Hit-and-run (HAR) to sample a + target distribution restricted to the polytope. We make three contributions. First, we provide a convergence + bound, paving the way to more precise mixing time analysis. Second, we present a robust implementation based + on multi-precision arithmetic, a mandatory ingredient to guarantee exact predicates and robust constructions. + We however allow controlled failures to happen, introducing the Sweeten Exact Geometric + Computing (SEGC) paradigm. Third, we use our HMC random walk to perform H-polytope volume calculations, + using it as an alternative to HAR within the volume algorithm by Cousins and Vempala. The systematic tests + conducted up to dimension + + + n + = + 100 + + + on the cube, the isotropic and the + standard simplex show that HMC significantly outperforms HAR both in terms of accuracy and running time. + Additional tests show that calculations may be handled up to dimension + + + n + = + 500 + + + . + These tests also establish that multiprecision is mandatory to avoid exits from the polytope.

+
+
+ Overlaying a hypergraph with a graph with bounded maximum degree, with application for low-resoluton + reconstructions of molecular assemblies + + + + + D. + Mazauric + + + + + In collaboration with F. Havet, T. V. H. Nguyen laboratoire I3S (CNRS, Université Côte d'Azur). +

We analyze a generalization of the minimum connectivity inference problem (MCI) that models the computation + of low-resolution structures of macro-molecular assemblies, based on data obtained by native mass + spectrometry. The generalization studied in this work, allows us to consider more refined constraints for the + characterization of low resolution models of large assemblies, such as degree constraints (e.g. a protein has + a limited number of other proteins in contact).

+

More precisely, let + + G + + and + + H + + be respectively a graph and a hypergraph defined on a same set of vertices, + and let + + F + + be + a graph. We say that + + G + + + + + + F + + -overlays a hyperedge + + + S + + of + + + H + + if the + subgraph of + + G + + induced by + + S + + contains + + F + + as a spanning subgraph, and + that + + G + + + + + + F + + -overlays + + + H + + if it + + F + + -overlays every hyperedge of + + + H + + . For a + fixed graph + + F + + and a fixed integer + + k + + , the problem + + + + ( + Δ + + k + ) + + + - + + F + + -Overlay + consists in deciding whether there exists a graph with maximum degree at most + + k + + that + + F + + -overlays a given hypergraph + + + H + + . In , we prove that for any graph + + F + + which is neither complete + nor anticomplete, there exists an integer + + + n + p + ( + F + ) + + + such that + + + + ( + Δ + + k + ) + + + - + + F + + -Overlay is + + + + N + P + + + -complete for all + + + k + + n + p + ( + F + ) + + + .

+
+
+ Conflict coloring problems: complexity and application to high resolution biological assembly + modeling + + + + + F. + Cazals + + + + + D. + Mazauric + + + + + In collaboration with F. Havet, T. V. H. Nguyen laboratoire I3S (CNRS, Université Côte d'Azur). +

Given a graph + + + G + = + ( + V + , + E + ) + + + , + a color set + + + C + ( + v + ) + + + for each vertex + + + v + . + + V + + + , a bipartite graph between + color sets + + + C + ( + u + ) + + + and + + + C + ( + v + ) + + + for every edge + + + u + v + + E + + + , Conflict + Coloring consists in deciding whether exists a conflict coloring, that is a coloring in which + + + c + ( + u + ) + c + ( + v + ) + + + + is not an edge of the bipartite graph. Conflict Coloring is motivated by computational + structural biology problems, high resolution determination of molecular assemblies. The graph represents the + subunits and the interaction between them, the colors are the given conformations, and the edges of the + bipartite graphs are the incompatible conformations of two subunits.

+

In this work, we first establish the complexity dichotomies (polynomial vs + + + N + P + + + -complete) + for Conflict Coloring and its variants. We provide some experiments in which we build + instances of Conflict Coloring associated to Voronoi diagram in the + plane, and we then analyse the existences of a solution related to parameters used in our experimental + setup.

+
+
+
+
+ Partnerships and cooperations + + + + + F. + Cazals + + + + + D. + Mazauric + + + + +
+ International research visitors +
+ Visits of international scientists +
+ Inria International Chair + + David Wales, Cambridge University, is endowed chair within 3IA Côte d'Azur / ABS. + +
+
+
+
+
+ Dissemination +
+ Promoting scientific activities +
+ Scientific events: organisation +

+ + + + + Frédéric Cazals was involved in the organization of:

+ + Symposium Multidisciplinary approaches in cancer research, Organized at Inria Sophia + Antipolis Méditerranée. Web: . + Winter School Machine Learning Methods to Analyze and Predict Protein Structure, Dynamics + and Function, CIRM, Luminy, November 7-12, 2021. Web: . + + Critical evaluation of methods for scoring interfaces of protein complexes, Online Elixir + 3D-Bioinfo meeting, organized by Emmanuel Levy (Elixir IL), Frederic Cazals (Elixir FR), Shoshana Wodak + (Elixir BE). + +
+
+ Scientific events: selection +
+ Member of the conference program committees +

Frédéric Cazals participated to the following program committees:

+ + Symposium on Solid and Physical Modeling + Intelligent Systems for Molecular Biology (ISMB) / European Conference on Computational Biology + (ECCB) + +
+
+
+ Invited talks +

+ + + + + Frédéric Cazals gave the following invited talks:

+ + + Mining protein flexibility: a new class of move sets; GDR BIM/GT MASIM, November 2021; + UCA, 5th Academy 4 Research Webinar - Mental Retardation and Protein Dynamics, October 2021. + +
+
+ Leadership within the scientific community +

+ + + + + Frédéric Cazals

+ + 2010-...: Member of the steering committee of the GDR Bioinformatique Moléculaire, for the Structure and + macro-molecular interactions theme. + 2017-...: Co-chair, with Yann Ponty, of the working group / groupe de travail (GT MASIM - Méthodes + Algorithmiques pour les Structures et Interactions Macromoléculaires), within the GDR de BIoinfor- matique + Moléculaire (GDR BIM, ). + +
+
+ Research administration +

+ + + + + Frédéric Cazals

+ + 2018-...: Member of the bureau du comité des équipes projets. + 2020-...: Member of the bureau of the EUR Life, Université Côte d’Azur. + +

+ + + + + Dorian Mazauric

+ + 2019-...: Member of the comité Plateformes. + +
+
+
+ Teaching - Supervision - Juries +
+ Teaching + + 2014–...: Master Data Sciences Program (M2), Department of Applied Mathematics, Ecole Centrale-Supélec; + Foundations of Geometric Methods in Data Analysis; F. Cazals and M. Carrière, Inria + Sophia / (ABS, DataShape). Web: . + 2021–...: Master Data Sciences & Artificial Intelligence (M1), Université Côte d’Azur; Introduction to machine learning (course practicals); E. Sarti. + 2021–...: Master Data Sciences & Artificial Intelligence (M2), Université Côte d’Azur; Geometric and topological methods in machine learning; F. Cazals, J-D. Boissonnat and M. Carrière, + Inria Sophia / (ABS, DataShape, DataShape); Web: . + 2021–...: Master Cancérologie et Recherche Translationnelle (M2), Université Côte d’Azur; Binding affinity maturation and protein interaction network analysis: two examples of bioinformatics + applications in medicine; F. Cazals. + 2020–...: Master Sciences du Vivant (M2), parcours Biologie, Informatique, Mathématiques, Université Côte + d’Azur; Introduction to statistical physics of biomolecules; F. Cazals. + 2018–...: Master : Algorithmique et Complexité, 23h30 TD, niveau M1, Polytech Nice Sophia, Université Côte + d'Azur, filière Sciences Informatiques, France; Dorian Mazauric. + +
+
+ Supervision +

PhD thesis:

+ + + PhD in progress, 3rd year: Timothée O'Donnel, Modeling the influenza + polymerase. Université Côte d'Azur. Thesis co-supervised by Frédéric Cazals and Bernard Delmas, INRA + Jouy-en-Josas. + + Defended PhD: Thi Viet Ha Nguyen, Graph Algorithms techniques for (low + and high) resolution models of large protein assemblies. Université Côte d'Azur. Thesis co-supervised + by Frédéric Havet, Laboratoire I3S (CNRS, Université Côte d'Azur). + +

Interns:

+ + Aarushi Gupta, intern from IIT Delhi, summer 2021. Modeling protein backbone flexibility + using solutions of the tripeptide loop closure. + + Louis Goldenberg, intern from Ecole Polytechnique, summer 2021. Parametric models for + compact clusters. + + Sebastián Gallardo Diaz, Universidad Técnica Federico Santa Marı́a, Valparaı́so, Chile. Advisors: Pierre + Kornprobst (Inria project-team Biovision), Dorian Mazauric. Algorithms for a new packing + problem : Towards Reading Accessible Newspapers. + Vivian Losciale, Université Côte d'Azur. Advisors: Jérémy Camponovo, Frédéric Havet, Buntheng Ly, Dorian + Mazauric, Maxime Sermesant.Jeux-vidéos de médiation : Intelligence artificielle pour + l’imagerie médicale. + Quentin Larose, Université Côte d'Azur. Advisors: Agnès Bessière, Carole Clastres, Jérémy Camponovo, Luc + Hogie, Dorian Mazauric, Eric Pascual, Sandrine Selosse, Brigitte Trousse. Portail des + ressources Terra Numerica. + +
+
+ Juries +

Frédéric Cazals participated to the following committees:

+ + Luke Dicks, Cambridge University, April 2021. Rapporteur for the PhD thesis K-means + landscapes: exploring clustering solution spaces using energy landscape theory. Advisor: David + Wales. + Manon Ruffini, Univ. of Toulouse, March 2021. Rapporteur on the PhD thesis Models and + Algorithms for Computational Protein Design. Advisor: Thomas Schiex. + Dorian Mazauric, Habilitation thesis, Université Côte d'Azur, November 2021. Committee member (president) + for the habilitation Algorithmique des graphes pour les réseaux et la biologie structurale + computationnelle. + +

Dorian Mazauric participated to the following committees:

+ + Thi Viet Ha Nguyen, Université Côte d'Azur, December 2021. Committee member for the PhD thesis Graph Algorithms techniques for (low and high) resolution models of large protein + assemblies. Advisors: Frédéric Havet, Dorian Mazauric. + +
+
+
+ Popularization +
+ Internal or external Inria responsibilities +

Dorian Mazauric:

+ + 2019–...: Head of Commission (Médiation et Animation des + MAthématiques, des Sciences et Techniques Informatiques et des Communications), Inria Sophia Antipolis - + Méditerranée. + 2019–...: Coordinator of , an ambitious scientific + popularisation project. Its main goal is to create a "Dedicated Digital space" in the south of France, (in + the spirit of the "Cité des Sciences" or "Palais de la découverte" in Paris). To do so, Terra Numerica is + developing and structuring popularisation activities, supports which are spread in different antennas + throughout the territory (e.g. Espace Terra Numerica - Valbonne Sophia Antipolis, MIA, in schools, + exhibition extensions...). This large-scale project involves (brings together) all the actors of research, + education, industry, associations and collectivities... It is actually composed of more than one hundred + people. + 2018–...: Member of the Conseil d'Administration de l'association les Petits Débrouillards. + 2017–...: Member of projet de médiation Galéjade : Graphes et ALgorithmes : Ensemble de Jeux À Destination + des Ecoliers... (mais pas que). + +
+
+ Articles and contents +

Frédéric Cazals:

+ + Podcast Investiga’Sciences + Vive la protéine: interview-discussion of + Thomas Schiex and myself by Valérie Ravinet, October 2021. . + +

Dorian Mazauric:

+ + Participation to the development of . + Participation to the development of popularization videos games . + +
+
+ Interventions +

Dorian Mazauric - Fête de la Science 2021:

+ + Village des Sciences de Villeneuve-Loubet Avec Pobot. Samedi 02 octobre 2021 et dimanche 03 octobre 2021. + With Thomas Dissaux, Adrien Gausseran, Nicolas Nisse, Eric Pascual, Lucas Picassari-Arrieta, Brigitte + Trousse. + Village des sciences de la vallée de la Vésubie avec Les Apprentis Pas Sages Samedi 02 octobre 2021. Puzzle du nid d’abeilles – Graphes et algorithmes grandeur nature. With Samantha + Lanney-Ricci, Magali Martin-Mazauric. + Interventions au Campus International de Valbonne Lundi 04 octobre 2021. La magie du binaire + – Pas besoin de réfléchir, les ordinateurs calculent tellement vite ? Problèmes actuels en + algorithmique. With Estelle Zavoli. + Atelier scientifique à l’Espace d’Art Concret (EAC), Mouans-Sartoux Organisé par l’EAC (Amandine Briand, + Sabrina Lah, Martin Merle, Claire Spada, Brigitte Segatori, Roubaud). Du lundi 04 octobre 2021 au vendredi + 08 octobre 2021. Des reines sur une oeuvre d’art (Tenth Copper Corner une oeuvre minimaliste + de Carl André formée de 55 carreaux) : mathématiques et algorithmique. With Frédéric Havet, Nicolas + Nisse, Martine Olivi. En collaboration avec Geoffroy Aubry et Valérie Doya (atelier de Physique). + Intervention au collège La chênaie de Mouans-Sartoux Mercredi 06 octobre 2021. La magie des + graphes et du binaire – Pas besoin de réfléchir, les ordinateurs calculent tellement vite ? Problèmes + actuels en algorithmique. With Mylène Raibaudi, Brigitte Trousse. + Village des Sciences de Mouans-Sartoux Samedi 09 octobre 2021. Sabrina Barnabé, Martine Olivi, Brigitte + Trousse, Thierry Viéville. + Festival des sciences de Nice d’Université Côte d’Azur Samedi 09 octobre 2021 et dimanche 10 octobre 2021. + With Alexandre Bonlarron, Foivos Fioravantes, Victor Jung, Hicham Lesfari, Steve Malalel, Magali + Martin-Mazauric, Romain Michelucci, Nicolas Nisse, Marie Pelleau, Nina Singlan, Rudan Xiao. + Interventions au collège de Roquebillière Jeudi 07 octobre 2021. La magie des graphes et du + binaire – Jeux combinatoires – Pas besoin de réfléchir, les ordinateurs calculent tellement vite ? + Problèmes actuels en algorithmique – Ateliers Jeux Graphes et Algorithmes. With Samantha + Lanney-Ricci. + Conférence à la médiathèque de Biot Vendredi 08 octobre 2021. + Village des Sciences et de l’Innovation de la CASA à Antibes Juan-les-Pins Avec PoBot, SLV, @b4games. + Samedi 16 octobre 2021 et dimanche 17 octobre 2021. With Agnès Bessière, Armel Berceliot, Étienne Chaplain, + Thomas Dissaux, Thierry Lespinasse, Stéphane Mansour, Magali Martin-Mazauric, Nicolas Nisse, Eric Pascual, + Lucas Picassari-Arrieta, Frédéric Rallo, Sandrine Selosse, Brigitte Trousse. + +

Dorian Mazauric - Interventions at Maison de l'Intelligence Artificielle:

+ + Ateliers Terra Numerica avec les étudiants du Master SmartEdTech. Mercredi 14 avril 2021. Journée + intensive de formation hybride et animation et co-création d’ateliers. With Saint-Clair Lefevre, Frédéric + Havet, Margarida Romero, Thierry Viéville. + +

Dorian Mazauric - Cordées de la réussite (coordonné par Université Côte d'Azur):

+ + Deux classes du collège Henri Nans, Aups. Les sciences du numérique à portée de mains ! Découvrir, + Explorer, Expérimenter ! Pirates et trésor : des maths et des algorithmes à la programmation + Scratch et mBot. With Frédéric Havet, Eric Pascual, Brigitte Trousse. + +

Dorian Mazauric - Programme Chiche:

+ + Intervention au lycée Apollinaire, Nice Jeudi 14 octobre 2021. + Intervention au lycée Estienne d’Orves, Nice Jeudi 21 octobre 2021. + Intervention au CIV, Valbonne Sophia Antipolis Jeudi 02 décembre 2021. + +

Dorian Mazauric - Formations:

+ + Formation d’enseignants co-organisée par la DANE et Terra Numerica avec les ateliers Terra Numerica à la + Maison de l’intelligence Artificielle. Mardi 9 mars 2021, mardi 23 mars 2021, mardi 6 avril 2021, mardi 20 + avril 2021, mardi 25 mai 2021. Machine d’apprentissage par renforcement pour gagner aux jeux, + Initiation à la reconnaissance d’images avec des drones, ateliers d’informatique débranchée. With + Jérémy Camponovo, Frédéric Havet, Eric Pascual, Brigitte Trousse. + Formation de personnels de médiathèques de la CASA. Vendredi 25 juin 2021, jeudi 23 septembre 2021. + Formation sur les fondements de l’informatique : Transmission de pensée – La magie du + binaire. + Présentation et formation au Fab’Ecole 06 de la DRANE, collège Bertone d’Antibes. Vendredi 26 novembre + 2021. Présentation et formation sur des ateliers Terra Numerica. With Brigitte Trousse. + +

Dorian Mazauric - In schools:

+ + Collège Bechet d’Antibes Juan-les-Pins Lundi 8 mars 2021. Dans le cadre du projet pédagogique Ethique des + données et de l’information (1/3). Introductions aux algorithmes. With Sylvain Etienne, Frédéric Giroire, + Géraldine Rouard, Brigitte Trousse. + Centre International de Valbonne Sophia Antipolis Lundi 15 mars 2021. Dans le cadre de séances autour de + l’Intelligence Artificielle avec une classe de terminale du CIV organisées par Les + Petits Débrouillards. Intelligence Artificielle et reconnaissance d’images. With Marie Barbieux, Marine + Beaudet, Soledad Tolosa. + Collège Bechet d’Antibes Juan-les-Pins Vendredi 26 mars 2021 et 9 avril 2021. Dans le cadre du projet + pédagogique Ethique des données et de l’information (2/3). Modélisation d’un réseau social et + de contenus, et algorithmes de recommandation. With Sylvain Etienne, Frédéric Giroire, Géraldine + Rouard, Brigitte Trousse. + Collège Bechet d’Antibes Juan-les-Pins Lundi 7 juin 2021. Dans le cadre du projet pédagogique Ethique des + données et de l’information (3/3). Conférence Protection des données et métier de Déléguée à + la Protection des Données d’Inria (Anne Combe). With Anne Combe, Sylvain Etienne, Frédéric Giroire, + Géraldine Rouard, Brigitte Trousse. + Roquefort-les-Pins Dans le cadre des activités du centre aéré de la commune. Lundi 26 juillet 2021 et + mardi 27 juillet 2021. Trois demi-journées : ateliers d’informatique débranchée (pour les 3 à + 6 ans), ateliers pour découvrir les algorithmes de recommandation dans les réseaux sociaux (pour les + adolescents) et tours de magie pour découvrir comment l’ordinateur compte (pour les 6 à 10 ans). With + Frédéric Havet. + Lycée Internationale de Valbonne Jeudi 02 décembre 2021. Ateliers algorithmiques grandeur + nature. With Bérengère Abric, Perrine Le Dûs. + +

Dorian Mazauric - Internships:

+ + Treize stagiaires de troisième au centre Inria d’Université Côte d’Azur Du lundi 13 décembre au vendredi + 17 décembre 2021. + +
+
+
+ + +
+ {@titre} + + + Distributed Link Scheduling in Wireless Networks + + + V. + Vishal + Misra + + + P. + Philippe + Nain + + + hal-01977266 + 10.1142/S1793830920500585%EF%BB%BF + + + Discrete Mathematics, Algorithms and Applications + + 2020 + 12 + 5 + 1-38 + + + + + + On the complexity of the representation of simplicial complexes by trees + + + J.-D. + Jean-Daniel + Boissonnat + + + hal-01259806 + 10.1016/j.tcs.2015.12.034 + + + Theoretical Computer Science + + February 2016 + 617 + 17 + + + + + + Energy landscapes and persistent minima + + + J. + J. + Carr + + + D. + D. + Mazauric + + + F. + F. + Cazals + + + D. J. + D. J. + Wales + + + 10.1063/1.4941052 + + + The Journal of Chemical Physics + + 2016 + 144 + 5 + 4 + + + + + + Conformational Ensembles and Sampled Energy Landscapes: Analysis and Comparison + + + F. + F. + Cazals + + + T. + T. + Dreyfus + + + D. + D. + Mazauric + + + A. + A. + Roth + + + C. + C.H. + Robert + + + 10.1002/jcc.23913 + + + J. of Computational Chemistry + + 2015 + 36 + 16 + 1213--1231 + + + + + + The Structural Bioinformatics Library: modeling in biomolecular science and beyond + + + F. + Frédéric + Cazals + + + T. + Tom + Dreyfus + + + hal-01379635 + + + + October 2016 + RR-8957 + + + + + + Beyond Two-sample-tests: Localizing Data Discrepancies in High-dimensional Spaces + + + F. + Frédéric + Cazals + + + hal-01245408 + + + IEEE/ACM International Conference on Data Science and Advanced Analytics + IEEE/ACM International Conference on Data Science and Advanced Analytics + + Paris, France + + + March 2015 + 29 + + + + + + Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees + + + F. + Frédéric + Cazals + + + hal-02425602 + + + NeurIPS 2019 - Thirty-third Conference on Neural Information Processing Systems + + Vancouver, Canada + + + December 2019 + + + + + + Comparing Two Clusterings Using Matchings between Clusters of Clusters + + + F. + Frédéric + Cazals + + + D. + Dorian + Mazauric + + + R. + Romain + Tetley + + + R. + Rémi + Watrigant + + + hal-02425599 + 10.1145/3345951 + + + ACM Journal of Experimental Algorithmics + + December 2019 + 24 + 1 + 1-41 + + + + + + Complexity dichotomies for the Minimum F -Overlay problem + + hal-01947563 + 10.1016/j.jda.2018.11.010 + + + Journal of Discrete Algorithms + + September 2018 + 52-53 + 133-142 + + + + + + A Sequential Non-Parametric Multivariate Two-Sample Test + + + F. + Frédéric + Cazals + + + hal-01968190 + + + IEEE Transactions on Information Theory + + May 2018 + 64 + 5 + 3361-3370 + + + + + + High Resolution Crystal Structures Leverage Protein Binding Affinity Predictions + + + S. + Simon + Marillet + + + F. + Frédéric + Cazals + + + hal-01159641 + + + + March 2015 + RR-8733 + + + + + + Novel Structural Parameters of Ig–Ag Complexes Yield a Quantitative Description of Interaction + Specificity and Binding Affinity + + + S. + Simon + Marillet + + + M.-P. + Marie-Paule + Lefranc + + + F. + Frédéric + Cazals + + + hal-01675467 + 10.3389/fimmu.2017.00034 + + + Frontiers in Immunology + + February 2017 + 8 + 34 + + + + + + Hybridizing rapidly growing random trees and basin hopping yields an improved exploration of + energy landscapes + + + A. + A. + Roth + + + T. + T. + Dreyfus + + + C. + C.H. + Robert + + + F. + F. + Cazals + + + 10.1002/jcc.24256 + + + J. Comp. Chem. + + 2016 + 37 + 8 + 739--752 + + + + + + Studying dynamics without explicit dynamics: A structure‐based study of the export mechanism by + AcrB + + + I. + Isabelle + Mus‐Veteau + + + F. + Frédéric + Cazals + + + hal-03006981 + 10.1002/prot.26012 + + + Proteins - Structure, Function and Bioinformatics + + September 2020 + + + + + + Boosting the analysis of protein interfaces with Multiple Interface String Alignments: + illustration on the spikes of coronaviruses + + + S. + Stéphane + Bereux + + + B. + B + Delmas + + + F. + Frédéric + Cazals + + + hal-03387889 + + + Proteins - Structure, Function and Bioinformatics + + November 2021 + + + + + + Improved polytope volume calculations based on Hamiltonian Monte Carlo with boundary + reflections and sweet arithmetics + + + A. + Augustin + Chevallier + + + F. + Frédéric + Cazals + + + hal-03048725 + + + Journal of Computational Geometry + + 2022 + + + + + + Tripeptide loop closure: a detailed study of reconstructions based on Ramachandran + distributions + + + T. + T + O'donnell + + + C. H. + C H + Robert + + + F. + F + Cazals + + + hal-03232851 + + + Proteins - Structure, Function and Bioinformatics + + 2022 + + + + + + Fréchet mean and <formula rend="inline"> + <math xmlns="http://www.w3.org/1998/Math/MathML"> + <mi>p</mi> + </math> + </formula>-mean on the unit circle: decidability, algorithm, and applications to + clustering on the flat torus + + + F. + Frédéric + Cazals + + + B. + B + Delmas + + + T. + Timothee + O'donnell + + + hal-03183028 + + + SEA 2021 - 19th Symposium on Experimental Algorithms + + Sophia Antipolis, France + + + June 2021 + + + + + + Graph Algorithm Techniques for Networks and Computational Structural Biology + + tel-03506086 + + + + November 2021 + + + + + + Graph problems motivated by (low and high) resolution models of large protein + assemblies + + + V.-H. + Viet-Ha + Nguyen + + + hal-03510188 + + + + December 2021 + + + + + + SARS-CoV-2 Through the Lens of Computational Biology:How bioinformatics is playing a key role + in the study of the virus and its origins + + + F. + Frédéric + Cazals + + + hal-03170023 + + + + March 2021 + 1-35 + + + + + + On the complexity of overlaying a hypergraph with a graph with bounded maximum degree + + + F. + Frédéric + Havet + + + D. + Dorian + Mazauric + + + V.-H. + Viet-Ha + Nguyen + + + hal-03368214 + + + + 2021 + + + + + + Crystal structure of chloroplast fructose-1,6-bisphosphate aldolase from the green alga + Chlamydomonas reinhardtii + + + T. + Théo + Le Moigne + + + E. + Edoardo + Sarti + + + A. + Antonin + Nourisson + + + A. + Alessandra + Carbone + + + J. + Julien + Henri + + + hal-03521911 + 10.1101/2021.12.28.474321 + + + + January 2022 + + + + + + Gene prioritization based on random walks with restarts and absorbing states, to define gene + sets regulating drug pharmacodynamics from single-cell analyses + + + A. + Augusto + Sales-De-Queiroz + + + G. G. + Guilherme Guilherme + Sales Santa Cruz + + + A. + Alain + Jean-Marie + + + D. + Dorian + Mazauric + + + F. + Frédéric + Cazals + + + hal-03438430 + + + + November 2021 + + + + + + Molecular dynamics: survey of methods for simulating the activity of proteins + + + S. + S.A. + Adcock + + + A. + A.J. + McCammon + + + + + Chemical reviews + + 2006 + 106 + 5 + 1589--1615 + + + + + + The molecular architecture of the nuclear pore complex + + + F. + F. + Alber + + + S. + S. + Dokudovskaya + + + L. + L.M. + Veenhoff + + + W. + W. + Zhang + + + J. + J. + Kipper + + + D. + D. + Devos + + + A. + A. + Suprapto + + + O. + O. + Karni-Schmidt + + + R. + R. + Williams + + + B. + B.T. + Chait + + + A. + A. + Sali + + + M. + M.P. + Rout + + + + + Nature + + 2007 + 450 + 7170 + 695--701 + + + + + + Dynamics on statistical samples of potential energy surfaces + + + K. + K.D. + Ball + + + R. + R.S. + Berry + + + + + The Journal of chemical physics + + 1999 + 111 + 5 + 2060--2070 + + + + + + Thermodynamics and an Introduction to Thermostatistics + + + H. + H.B. + Callen + + + + + + 1985 + Wiley + + + + + + De novo design of picomolar SARS-CoV-2 miniprotein inhibitors + + + L. + L. + Cao + + + I. + I. + Goreshnik + + + B. + B. + Coventry + + + J. + J.B. + Case + + + L. + L. + Miller + + + L. + L. + Kozodoy + + + R. + R. + Chen + + + L. + L. + Carter + + + A. + A. + Walls + + + Y.-J. + Y-J. + Park + + + E.-M. + E-M + Strauch + + + L. + L. + Stewart + + + M. + M.S. + Diamond + + + D. + D. + Veesler + + + D. + D. + Baker + + + + + Science + + 2020 + 370 + 6515 + 426--431 + + + + + + Energy landscapes and persistent minima + + + J. + J. + Carr + + + D. + D. + Mazauric + + + F. + F. + Cazals + + + D. J. + D. J. + Wales + + + 10.1063/1.4941052 + + + The Journal of Chemical Physics + + 2016 + 144 + 5 + 4 + + + + + + A practical volume algorithm + + + B. + B. + Cousins + + + S. + S. + Vempala + + + + + Mathematical Programming Computation + + 2016 + 8 + 2 + 133--160 + + + + + + Understanding molecular simulation + + + D. + D. + Frenkel + + + B. + B. + Smit + + + + + + 2002 + Academic Press + + + + + + Random walks and an <formula rend="inline"> + <math xmlns="http://www.w3.org/1998/Math/MathML"> + <mrow> + <msup> + <mi>O</mi> + <mo>*</mo> + </msup> + <mrow> + <mo>(</mo> + </mrow> + <msup> + <mi>n</mi> + <mn>5</mn> + </msup> + </mrow> + </math> + </formula>) volume algorithm for convex bodies + + + R. + R. + Kannan + + + L. + L. + Lovász + + + M. + M. + Simonovits + + + + + Random Structures & Algorithms + + 1997 + 11 + 1 + 1--50 + + + + + + A guide to Monte Carlo simulations in statistical physics + + + D. + D. + Landau + + + K. + K. + Binder + + + + + + 2014 + Cambridge university press + + + + + + Free energy computations: A mathematical perspective + + + T. + T. + Lelièvre + + + G. + G. + Stoltz + + + M. + M. + Rousset + + + + + + 2010 + World Scientific + + + + + + Prediction, determination and validation of phase diagrams via the global study of energy + landscapes + + + C. + C. + Schön + + + M. + M. + Jansen + + + + + Int. J. of Materials Research + + 2009 + 100 + 2 + 135 + + + + + + Improved protein structure prediction using potentials from deep learning + + + A. + A. + Senior + + + R. + R. + Evans + + + J. + J. + Jumper + + + J. + J. + Kirkpatrick + + + L. + L. + Sifre + + + T. + T. + Green + + + C. + C. + Qin + + + A. + A. + Żídek + + + A. + A. + Nelson + + + A. + A. + Bridgland + + + H. + H. + Penedones + + + S. + S. + Petersen + + + K. + K. + Simonyan + + + S. + S. + Crossan + + + K. + K. + Pushmeet + + + D. + D. + Jones + + + D. + D. + Silver + + + K. + K. + Kavukcuoglu + + + D. + D. + Hassabis + + + + + Nature + + 2020 + 1--5 + + + + + + Atomic-level characterization of the structural dynamics of proteins. + + + D. E. + D. E. + Shaw + + + P. + P. + Maragakis + + + K. + K. + Lindorff-Larsen + + + S. + S. + Piana + + + R. O. + R. O. + Dror + + + M. P. + M. P. + Eastwood + + + J. A. + J. A. + Bank + + + J. M. + J. M. + Jumper + + + J. K. + J. K. + Salmon + + + Y. + Y. + Shan + + + W. + W. + Wriggers + + + + + Science + + 2010 + 330 + 6002 + 341--346 + + + + + + Energy Landscapes + + + D. J. + D. J. + Wales + + + + + + 2003 + Cambridge University Press + + + + + + Building force fields: an automatic, systematic, and reproducible approach + + + L.-P. + Lee-Ping + Wang + + + T. J. + Todd J + Martinez + + + V. S. + Vijay S + Pande + + + + + The journal of physical chemistry letters + + 2014 + 5 + 11 + 1885--1891 + + + +
+
+
+
diff --git a/RADAR2TEI/TestData/abs.xml b/RADAR2TEI/TestData/abs.xml new file mode 100644 index 0000000..c29b4df --- /dev/null +++ b/RADAR2TEI/TestData/abs.xml @@ -0,0 +1,2536 @@ + + + + + ABS + Algorithms - Biology - Structure + Computational Biology + Digital Health, Biology and Earth + http://team.inria.fr/abs + Creation of the Project-Team: 2008 July 01 + Project-Team + + A2.5. - Software engineering + A3.3.2. - Data mining + A3.4.1. - Supervised learning + A3.4.2. - Unsupervised learning + A6.1.4. - Multiscale modeling + A6.2.4. - Statistical methods + A6.2.8. - Computational geometry and meshes + A8.1. - Discrete mathematics, combinatorics + A8.3. - Geometry, Topology + A8.7. - Graph theory + A9.2. - Machine learning + + + B1.1.1. - Structural biology + B1.1.5. - Immunology + B1.1.7. - Bioinformatics + + + + + + Frédéric + Cazals + Chercheur + Sophia + Team leader, Inria, Senior Researcher + oui + + + Dorian + Mazauric + Chercheur + Sophia + Inria, Researcher + oui + + + Edoardo + Sarti + Chercheur + Sophia + Inria, Researcher, from Oct 2021 + + + Vladimir + Krajnak + PostDoc + Sophia + Inria, from Oct 2021 + + + Timothee + O Donnell + PhD + Sophia + Inria + + + Louis + Goldenberg + Stagiaire + Sophia + Inria, from Mar 2021 until Aug 2021 + + + Aarushi + Gupta + Stagiaire + Sophia + Inria, from May 2021 until Aug 2021 + + + Valentin + Madelaine + Stagiaire + Sophia + Inria, until Mar 2021 + + + Maximilien + Martin + Stagiaire + Sophia + École Normale Supérieure de Lyon, from Oct 2021 + + + Florence + Barbara + Assistant + Sophia + Inria + + + Charles + Robert + CollaborateurExterieur + Sophia + CNRS + oui + + + Konstantin + Roeder + CollaborateurExterieur + Sophia + Robinson College - Cambridge + + + + Overall objectives + + Biomolecules and their function(s). +

 Computational Structural Biology (CSB) is the scientific domain concerned with the development of algorithms + and software to understand and predict the structure and function of biological macromolecules. This research + field is inherently multi-disciplinary. On the experimental side, biology and medicine provide the objects + studied, while biophysics and bioinformatics supply experimental data, which are of two main kinds. On the one + hand, genome sequencing projects give supply protein sequences, and ~200 millions of sequences have been + archived in UniProtKB/TrEMBL – which collects the protein sequences + yielded by genome sequencing projects. On the other hand, structure determination experiments (notably X-ray + crystallography, nuclear magnetic resonance, and cryo-electron microscopy) give access to geometric models of + molecules – atomic coordinates. Alas, only ~150,000 structures have been solved and deposited in the Protein + Data Bank (PDB), a number to be compared against the 108 sequences found in UniProtKB/TrEMBL. With one structure for ~1000 sequences, we hardly know anything + about biological functions at the atomic/structural level. Complementing experiments, physical + chemistry/chemical physics supply the required models (energies, thermodynamics, etc). More specifically, let us + recall that proteins with n atoms has d=3n Cartesian coordinates, and fixing + these (up to rigid motions) defines a conformation. As conveyed by the iconic lock-and-key + metaphor for interacting molecules, Biology is based on the interactions stable conformations make with each + other. Turning these intuitive notions into quantitative ones requires delving into statistical physics, as + macroscopic properties are average properties computed over ensembles of conformations. Developing effective + algorithms to perform accurate simulations is especially challenging for two main reasons. The first one is the + high dimension of conformational spaces – see d=3n above, typically several tens of + thousands, and the non linearity of the energy functionals used. The second one is the multiscale nature of the + phenomena studied: with biologically relevant time scales beyond the millisecond, and atomic vibrations periods + of the order of femto-seconds, simulating such phenomena typically requires 1012 conformations/frames, a (brute) + tour de force rarely achieved  .

+

+ + + Computational Structural Biology: three main challenges. +

 The first challenge, sequence-to-structure prediction, aims to infer the possible + structure(s) of a protein from its amino acid sequence. While recent progress has been made recently using in + particular deep learning techniques , the models obtained so far + are static and coarse-grained.

+

The second one is protein function prediction. Given a protein with known structure i.e. 3D coordinates, the goal is to predict the partners of this protein, in terms of stability + and specificity. This understanding is fundamental to biology and medicine, as illustrated by the example of the + SARS-CoV-2 virus responsible of the Covid19 pandemic. To infect a host, the virus first fuses its envelope with + the membrane of a target cell, and then injects its genetic material into that cell. Fusion is achieved by a + so-called class I fusion protein, also found in other viruses (influenza, SARS-CoV-1, HIV, etc). The fusion + process is a highly dynamic process involving large amplitude conformational changes of the molecules. It is + poorly understood, which hinders our ability to design therapeutics to block it.

+ + + + + +
+ +
+ Figure 1: The synergy modeling - experiments, and challenges faced in CSB: illustration + on the problem of designing miniproteins blocking the entry of SARS-CoV-2 into cells. From . Of note: the first step of the infection by + SARS-CoV-2 is the attachment of its receptor binding domain of its spike (RBD, blue molecule), to a target + protein found on the membrane of our cells, ACE2 (orange molecule). A strategy to block infection is therefore + to engineer a molecule binding the RBD, preventing its attachment to ACE2. (A) Design of + a helical protein (orange) mimicking a region of the ACE2 protein. (B) Assessment of + binding modes (conformation, binding energies) of candidate miniproteins neutralizing the RBD. +
+

Finally, the third one, large assembly reconstruction, aims at solving (coarse-grain) + structures of molecular machines involving tens or even hundreds of subunits. This research vein was promoted + about 15 years back by the work on the nuclear pore complex . It is often + referred to as reconstruction by data integration, as it necessitates to combine coarse-grain + models (notably from cryo-electron microscopy (cryo-EM) and native mass spectrometry) with atomic models of + subunits obtained from X ray crystallography. Fitting the latter into the former requires exploring the + conformation space of subunits, whence the importance of protein dynamics.

+

As an illustration of these three challenges, consider the problem of designing proteins blocking the entry of + SARS-CoV-2 into our cells (Fig. ). The first challenge is illustrated + by the problem of predicting the structure of a blocker protein from its sequence of amino-acids – a tractable + problem here since the mini proteins used only comprise of the order of 50 amino-acids (Fig. (A), ). The second + challenge is illustrated by the calculation of the binding modes and the binding affinity of the designed + proteins for the RBD of SARS-CoV-2 (Fig. (B)). Finally, the last challenge is + illustrated by the problem of solving structures of the virus with a cell, to understand how many spikes are + involved in the fusion mechanism leading to infection. In , the promising + designs suggested by modeling have been assessed by an array of wet lab experiments (affinity measurements, + circular dichroism for thermal stability assessment, structure resolution by cryo-EM). The hyperstable minibinders identified provide starting points for SARS-CoV-2 therapeutics  . We note in passing that this is truly remarkable work, yet, + the designed proteins stem from a template (the bottom helix from ACE2), and are rather + small.

+ + + + + +
+ +
+ Figure 2: The main challenges of molecular simulation: Finding significant local minima of the energy + landscape, computing statistical weights of catchment basins by integrating Boltzmann's factor, and + identifying transitions. Practically, d>100. +
+
+ + Protein dynamics: core CS - maths challenges. +

To present challenges in structural modeling, let us recall the following ingredients. First, a molecular model + with n atoms + is parameterized over a conformational space 𝒳 of dimension d=3n in Cartesian coordinates, or + d=3n-6 in internal + coordinate–upon removing rigid motions, also called degree of freedom (d.o.f.). Second, + recall that the potential energy landscape (PEL) is the mapping V(·) from d to providing a + potential energy for each conformation  , . Example potential energies (PE) are CHARMM, AMBER, MARTINI, etc. Such PE belong to the realm of + molecular mechanics, and implement atomic or coarse-grain models. They may embark a solvent model, either + explicit or implicit. Their definition requires a significant number of parameters (up to 1,000), fitted to reproduce + physico-chemical properties of (bio-)molecules  .

+

These PE are usually considered good enough to study non covalent interactions – our focus, even tough they do + not cover the modification of chemical bonds. In any case, we take such a function for granted .

+

The PEL codes all structural, thermodynamic, and kinetic properties, + which can be obtained by averaging properties of conformations over so-called thermodynamic + ensembles. The structure of a macromolecular system requires the characterization of + active conformations and important intermediates in functional pathways involving significant basins. In + assigning occupation probabilities to these conformations by integrating Boltzmann's distribution, one treats + thermodynamics. Finally, transitions between the states, modeled, say, by a master + equation (a continuous-time Markov process), correspond to kinetics. Classical simulation + methods based on molecular dynamics (MD) and Monte Carlo sampling (MC) are developed in the lineage of the + seminal work by the 2013 recipients of the Nobel prize in chemistry (Karplus, Levitt, Warshel), which was + awarded “for the development of multiscale models for complex chemical systems”. However, + except for highly specialized cases where massive calculations have been used , neither MD nor MC give access to the aforementioned time + scales. In fact, the main limitation of such methods is that they treat structural, thermodynamic and kinetic + aspects at once . The absence of specific insights on these three + complementary pieces of the puzzle makes it impossible to optimize simulation methods, and results in general in + the inability to obtain converged simulations on biologically relevant time-scales.

+

The hardness of structural modeling owes to three intertwined reasons.

+

First, PELs of biomolecules usually exhibit a number of critical points exponential in the dimension  ; fortunately, they enjoy a multi-scale structure  . Intuitively, the significant local minima/basins are those + which are deep or isolated/wide, two notions which are mathematically + qualified by the concepts of persistence and prominence. Mathematically, problems are plagued with the curse of + dimensionality and measure concentration phenomena. Second, biomolecular processes are inherently multi-scale, + with motions spanning 15 and 4 orders of magnitude in time and amplitude respectively . Developing methods able to exploit this multi-scale + structure has remained elusive. Third, macroscopic properties of biomolecules i.e. + observables, are average properties computed over ensembles of conformations, which calls for a multi-scale + statistical treatment both of thermodynamics and kinetics.

+
+ + Validating models. +

A natural and critical question naturally concerns the validation of models proposed in structural + bioinformatics. For all three types of questions of interest (structures, thermodynamics, kinetics), there exist + experiments to which the models must be confronted – when the experiments can be conducted.

+

For structures, the models proposed can readily be compared against experimental results stemming from X ray + crystallography, NMR, or cryo electron microscopy. For thermodynamics, which we illustrate here with binding + affinities, predictions can be compared against measurements provided by calorimetry or surface plasmon + resonance. Lastly, kinetic predictions can also be assessed by various experiments such as binding affinity + measurements (for the prediction of Kon and Koff), or fluorescence + based methods (for kinetics of folding).

+
+
+ + Research program +

Our research program ambition to develop a comprehensive set of novel concepts and algorithms to study protein + dynamics, based on the modular framework of PEL.

+ + Modeling the dynamics of proteins + Keywords: Molecular conformations, conformational exploration, energy landscapes, + thermodynamics, kinetics. +

As noticed while discussing Protein dynamics: core CS - maths challenges, the integrated + nature of simulation methods such as MD or MC is such that these methods do not in general give access to + biologically relevant time scales. The framework of energy landscapes , (Fig. ) is much more + modular, yet, large biomolecular systems remain out of reach.

+

To make a definitive step towards solving the prediction of protein dynamics, we will serialize the discovery + and the exploitation of a PEL , , . Ideas and + concepts from computational geometry/geometric motion planning, machine learning, probabilistic algorithms, and + numerical probability will be used to develop two classes of probabilistic algorithms. The first deals with + algorithms to discover/sketch PELs i.e. enumerate all significant (persistent or prominent) + local minima and their connections across saddles, a difficult task since the number of all local + minima/critical points is generally exponential in the dimension. To this end, we will develop a hierarchical + data structure coding PELs as well as multi-scale proposals to explore molecular conformations. (Nb: in Monte + Carlo methods, a proposal generates a new conformation from an existing one.) The second focuses on methods to + exploit/sample PELs i.e. compute so-called densities of states, from which all thermodynamic + quantities are given by standard relations  . This is a hard problem akin to high-dimensional + numerical integration. To solve this problem, we will develop a learning based strategy for the Wang-Landau + algorithm  –an adaptive Monte Carlo Markov Chain (MCMC) algorithm, as + well as a generalization of multi-phase Monte Carlo methods for convex/polytope volume calculations  , , for + non convex strata of PELs.

+
+ + Algorithmic foundations: geometry, optimization, machine learning + Keywords: Geometry, optimization, machine learning, randomized algorithms, + sampling, optimization.. +

As discussed in the previous Section, the study of PEL and protein dynamics raises difficult algorithmic / + mathematical questions. As an illustration, one may consider our recent work on the comparison of high + dimensional distribution , statistical tests / + two-sample tests , , + the comparison of clustering , the complexity study of + graph inference problems for low-resolution reconstruction of assemblies , the analysis of partition (or clustering) stability + in large networks, the complexity of the representation of simplicial complexes . Making progress on such questions is + fundamental to advance the state-of-the art on protein dynamics.

+

We will continue to work on such questions, motivated by CSB / theoretical biophysics, both in the continuous + (geometric) and discrete settings. The developments will be based on a combination of ideas and concepts from + computational geometry, machine learning (notably on non linear dimensionality reduction, the reconstruction of + cell complexes, and sampling methods), graph algorithms, probabilistic algorithms, optimization, numerical + probability, and also biophysics.

+
+ + Software: the Structural Bioinformatics Library + Keywords: Scientific software, generic programming, molecular + modeling.. +

While our main ambition is to advance the algorithmic foundations of molecular simulation, a major challenge + will be to ensure that the theoretical and algorithmic developments will change the fate of applications, as + illustrated by our case studies. To foster such a symbiotic relationship between theory, algorithms and + simulation, we will pursue high quality software development and integration within the SBL, + and will also take the appropriate measures for the software to be widely adopted.

+ + Software in structural bioinformatics. +

Software development for structural bioinformatics is especially challenging, combining advanced geometric, + numerical and combinatorial algorithms, with complex biophysical models for PEL and related + thermodynamic/kinetic properties. Specific features of the proteins studied must also be accommodated. About + 50 years after the development of force fields and simulation methods (see the 2013 Nobel prize in chemistry), + the software implementing such methods has a profound impact on molecular science at large. One can indeed + cite packages such as CHARMM, AMBER, gromacs, gmin, MODELLER, Rosetta, VMD, PyMol, .... On the other hand, these packages are goal oriented, each tackling a (small set + of) specific goal(s). In fact, no real modular software design and integration has taken place. As a result, + despite the high quality software packages available, inter-operability between algorithmic building blocks + has remained very limited.

+
+ + The SBL. +

Predicting the dynamics of large molecular systems requires the integration of advanced algorithmic building + blocks / complex software components. To achieve a sufficient level of integration, we undertook the + development of the Structural Bioinformatics Library (SBL, , a generic C++/python + cross-platform library providing software to solve complex problems in structural bioinformatics. For + end-users, the SBL provides ready to use, state-of-the-art applications to model + macro-molecules and their complexes at various resolutions, and also to store results in perennial and easy to + use data formats (). For developers, the SBL provides a + broad C++/python toolbox with modular design (). This hybrid status targeting both + end-users and developers stems from an advanced software design involving four software components, namely + applications, core algorithms, biophysical models, and modules (). This modular design makes it possible to optimize + robustness and the performance of individual components, which can then be assembled within a goal oriented + application.

+
+
+ + Applications: modeling interfaces, contacts, and interactions + Keywords: Protein interactions, protein complexes, + structure/thermodynamics/kinetics prediction. +

Our methods will be validated on various systems for which flexibility operates at various scales. Example such + systems are antibody-antigen complexes, (viral) polymerases, (membrane) transporters.

+

Even very complex biomolecular systems are deterministic in prescribed conditions (temperature, pH, etc), + demonstrating that despite their high dimensionality, all d.o.f. are not at play at the same + time. This insight suggests three classes of systems of particular interest. The first class consists of systems + defined from (essentially) rigid blocks whose relative positions change thanks to conformational changes of + linkers; a Newton cradle provides an interesting way to envision such as system. We have recently worked on one + such system, a membrane proteins involve in antibiotic resistance (AcrB, see . The second class consists of cases where relative + positions of subdomains do not significantly change, yet, their intrinsic dynamics are significantly altered. A + classical illustration is provided by antibodies, whose binding affinity owes to dynamics localized in six + specific loops , . + The third class, consisting of composite cases, will greatly benefit from insights on the first two classes. As + an example, we may consider the spikes of the SARS-CoV-2 virus, whose function (performing infection) involves + both large amplitude conformational changes and subtle dynamics of the so-called receptor binding domain. We + have started to investigate this system, in collaboration with B. Delmas (INRAe) .

+

In ABS, we will investigate systems in these three tiers, in collaboration with expert collaborators, to + hopefully open new perspectives in biology and medicine. Along the way, we will also collaborate on selected + questions at the interface between CSB and systems biology, as it is now clear that the structural level and the + systems level (pathways of interacting molecules) can benefit from one another.

+
+
+ + Application domains +

The main application domain is Computational Structural Biology, as underlined in the Research + Program.

+
+ + Highlights of the year +

In October 2021, Edoardo Sarti has joined ABS as Chargé de Recherche de Classe Normale. His + expertise comprises a diverse set of interests spanning from algorithmic questions about geometrical, functional + and evolutionary aspects of biomolecules (latest study: ), to + the collection and analysis of large collections of molecular structural data. From the very start, E. Sarti has + started taking part in several research and technical projects of ABS.

+
+ + New software and platforms +

See report on the Structural Bioinformatics Library.

+ + New software + + SBL + + + + + + + + + + + +
+ + New results + + Modeling interfaces, contacts, and interactions + Keywords: docking, scoring, interfaces, protein complexes, Voronoi diagrams, + arrangements of balls. + + Boosting the analysis of protein interfaces with Multiple Interface String Alignments: illustration + on the spikes of coronaviruses + + + F. + Cazals + + + + In collaboration with S. Bereux and B. Delmas (INRAe, Jouy-en-Josas). + +

In this work , we introduce Multiple Interface + String Alignment (MISA), a visualization tool to display coherently various sequence and structure + based statistics at protein-protein interfaces (SSE elements, buried surface area, ΔASA, B factor values, etc). The + amino-acids supporting these annotations are obtained from Voronoi interface models. The benefit of MISA is to + collate annotated sequences of (homologous) chains found in different biological contexts i.e. bound with + different partners or unbound. The aggregated views MISA/SSE, MISA/BSA, MISA/ΔASA etc make it trivial to identify + commonalities and differences between chains, to infer key interface residues, and to understand where + conformational changes occur upon binding. As such, they should prove of key relevance for knowledge based + annotations of protein databases such as the Protein Data Bank.

+

Illustrations are provided on the receptor binding domain (RBD) of coronaviruses, in complex with their + cognate partner or (neutralizing) antibodies. MISA computed with a minimal number of structures complement and + enrich findings previously reported.

+

The corresponding package is available from the Structural Bioinformatics Library (

+

and ).

+
+ + SARS-CoV-2 Through the Lens of Computational Biology: How bioinformatics is playing a key role in the + study of the virus and its origins + + + F. + Cazals + + + + In collaboration with Samuel Alizon (MIVEGEC - Maladies infectieuses et vecteurs : écologie, + génétique, évolution et contrôle), Stéphane Guindon (MAB - Méthodes et Algorithmes pour la Bioinformatique, + LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier), Claire Lemaitre + (GenScale - Scalable, Optimized and Parallel Algorithms for Genomics, Inria Rennes – Bretagne Atlantique), + Tristan Mary-Huard (INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et + l’Environnement), Anna Niarakis (Lifeware - Computational systems biology and optimization, Inria Saclay - Ile + de France; GenHotel - Laboratoire de recherche européen pour la polyarthrite rhumatoïde), Mikaël Salson + (CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189), Celine Scornavacca + (UMR ISEM - Institut des Sciences de l'Evolution de Montpellier), Hélène Touzet (CRIStAL - Centre de Recherche + en Informatique, Signal et Automatique de Lille - UMR 9189). + +

On December 2019, the Chinese Center for Disease Control reported several cases of severe pneumonia that + resists usual treatments in the city of Wuhan. This was the beginning of the COVID-19 pandemic which caused + more than 80 millions infection cases and 1.7 millions deaths during the year 2020 alone1 . This major + outbreak has given rise to global public health responses as well as an international research effort of + unprecedented scope and speed. This scientific mobilization has led to remarkable results, which have enabled + a great deal of knowledge to be accumulated in just a few months on this novel pathogen: identification of the + virus, of its main proteins, analysis of its origin and its functionning. This basic biological knowledge is + mandatory to medical advances: design tests, find a vaccine or a cure.

+

In this document , one year after the beginning of the worldwide + spread of the disease, we wish to shed particular light on the contribution of bioinformatics in all this + work. Bioinformatics is a discipline at crossroads of computer sciences, mathematics and biology that has + taken on an inestimable importance in modern biology and medicine. It provides computational models, + algorithms and software to the scientific community, that are both operational and effective. The discovery + and study of the SARS-Cov-2 coronavirus is an emblematic example. The utilization of bioinformatics methods + has been at the heart of essential milestones : from the sequencing of the virus genome and its annotation to + the history of its origin, the modelisation of interacting biological entities both at the molecular scale and + at the network scale, and the study of the host genetic susceptibility. All these studies, as a whole, have + made it possible to elucidate the nature and the functionning of the novel pathogen and have greatly + contributed to the fight against COVID-19.

+
+ + Gene prioritization based on random walks with restarts and absorbing states, to define gene sets + regulating drug pharmacodynamics from single-cell analyses + + + F. + Cazals + + + D. + Mazauric + + + A. + Sales de Queiroz + + + G. + Sales Santa Cruz + + + + In collaboration with Alain Jean-Marie (Inria Neo) and Jérémie Roux (Inserm and CNRS and UCA). + +

Prioritizing genes for their role in drug sensitivity, is an important step in understanding drugs mechanisms + of action and discovering new molecular targets for co-treatment. In this work , we formalize this problem by considering + two sets of genes X and P respectively composing the predictive gene signature of sensitivity to a + drug and the genes involved in its mechanism of action, as well as a protein interaction network (PPIN) + containing the products of X and P as nodes. We introduce Genetrank, a method to prioritize + the genes in X for their likelihood to regulate the genes in P.

+

Genetrank uses asymmetric random walks with restarts, absorbing states, and a suitable + renormalization scheme. Using novel so-called saturation indices, we show that the conjunction of absorbing + states and renormalization yields an exploration of the PPIN which is much more progressive than that afforded + by random walks with restarts only. Using MINT as underlying network, we apply Genetrank to + a predictive gene signature of cancer cells sensitivity to tumor-necrosis-factor-related apoptosis-inducing + ligand (TRAIL), performed in single-cells. Our ranking provides biological insights on drug sensitivity and a + gene set considerably enriched in genes regulating TRAIL pharmacodynamics when compared to the most + significant differentially expressed genes obtained from a statistical analysis framework alone. We also + introduce gene expression radars, a visualization tool to assess all pairwise interactions + at a glance.

+

Genetrank is made available in the Structural Bioinformatics Library (). It should prove useful for mining gene sets in + conjunction with a signaling pathway, whenever other approaches yield relatively large sets of genes.

+
+
+ + Modeling the dynamics of proteins + Keywords: protein, flexibility, collective coordinate, conformational sampling + dimensionality reduction. + + Tripeptide loop closure: a detailed study of reconstructions based on Ramachandran + distributions + + + F. + Cazals + + + T. + O'Donnell + + + + In collaboration with C. Robert (IBPC / CNRS, Paris, France). + +

Tripeptide loop closure (TLC) is a standard procedure to reconstruct protein backbone conformations, by + solving a polynomial system in a single variable yielding up to 16 real solutions.

+

In this work , we first show that multiprecision is required in + a TLC solver to guarantee the existence and the accuracy of solutions. We then compare solutions yielded by + the TLC solver against tripeptides from the Protein Data Bank. We show that these solutions are geometrically + diverse (up to 3Å RMSD with respect to the data), and sound in terms + of potential energy. Finally, we compare Ramachandran distributions of data and reconstructions for the three + amino acids. The distribution of reconstructions in the second angular space (ϕ2,ψ2) + stands out, with a rather uniform distribution leaving a central void.

+

We anticipate that these insights, coupled to our robust implementation in the Structural Bioinformatics + Library (), will help understanding the + properties of TLC reconstructions, with potential applications to the generation of conformations of flexible + loops in particular.

+
+
+ + Algorithmic foundations + Keywords: Computational geometry, computational topology, optimization, data + analysis. + + Frechet mean and p-mean on the unit circle: characterization, decidability, and algorithm + + + F. + Cazals + + + T. + O'Donnell + + + + In collaboration with B. Delmas (INRAe, Jouy-en-Josas). + +

The center of mass of a point set lying on a manifold generalizes the celebrated Euclidean centroid, and is + ubiquitous in statistical analysis in non Euclidean spaces.

+

In this work , we give a complete characterization of the weighted + p-mean of + a finite set of angular values on S1, based on a decomposition of S1 such that the functional of interest has at most one + local minimum per cell. This characterization is used to show that the problem is decidable for rational + angular values –a consequence of Lindemann's theorem on the transcendence of π, and to develop an effective + algorithm parameterized by exact predicates. A robust implementation of this algorithm based on + multi-precision interval arithmetic is also presented, and is shown to be effective for large values of + n and + p. We use + it as building block to implement the k-means and k-means++ clustering algorithms on the flat torus, with + applications to clustering protein molecular conformations. These algorithms are available in the Structural + Bioinformatics Library ().

+

Our derivations are of interest in two respects. First, efficient p-mean calculations are relevant to + develop principal components analysis on the flat torus encoding angular spaces–a particularly important case + to describe molecular conformations. Second, our two-stage strategy stresses the interest of combinatorial + methods for p-means, also emphasizing the role of numerical issues.

+
+ + Improved polytope volume calculations based on Hamiltonian Monte Carlo with boundary reflections and + sweet arithmetics + + + F. + Cazals + + + A. + Chevallier + + + + In collaboration with S. Pion IMS (Univ. Bordeaux / Bordeaux INP / CNRS UMR 5218). + +

Computing the volume of a high dimensional polytope is a fundamental problem in geometry, also connected to + the calculation of densities of states in statistical physics, and a central building block of such algorithms + is the method used to sample a target probability distribution.

+

This paper studies Hamiltonian Monte Carlo (HMC) with + reflections on the boundary of a domain, providing an enhanced alternative to Hit-and-run (HAR) to sample a + target distribution restricted to the polytope. We make three contributions. First, we provide a convergence + bound, paving the way to more precise mixing time analysis. Second, we present a robust implementation based + on multi-precision arithmetic, a mandatory ingredient to guarantee exact predicates and robust constructions. + We however allow controlled failures to happen, introducing the Sweeten Exact Geometric + Computing (SEGC) paradigm. Third, we use our HMC random walk to perform H-polytope volume calculations, + using it as an alternative to HAR within the volume algorithm by Cousins and Vempala. The systematic tests + conducted up to dimension n=100 on the cube, the isotropic and the + standard simplex show that HMC significantly outperforms HAR both in terms of accuracy and running time. + Additional tests show that calculations may be handled up to dimension n=500. + These tests also establish that multiprecision is mandatory to avoid exits from the polytope.

+
+ + Overlaying a hypergraph with a graph with bounded maximum degree, with application for low-resoluton + reconstructions of molecular assemblies + + + D. + Mazauric + + + + In collaboration with F. Havet, T. V. H. Nguyen laboratoire I3S (CNRS, Université Côte d'Azur). + +

We analyze a generalization of the minimum connectivity inference problem (MCI) that models the computation + of low-resolution structures of macro-molecular assemblies, based on data obtained by native mass + spectrometry. The generalization studied in this work, allows us to consider more refined constraints for the + characterization of low resolution models of large assemblies, such as degree constraints (e.g. a protein has + a limited number of other proteins in contact).

+

More precisely, let G and H be respectively a graph and a hypergraph defined on a same set of vertices, + and let F be + a graph. We say that GF-overlays a hyperedge + S of + H if the + subgraph of G induced by S contains F as a spanning subgraph, and + that GF-overlaysH if it F-overlays every hyperedge of + H. For a + fixed graph F and a fixed integer k, the problem (Δk)-F-Overlay + consists in deciding whether there exists a graph with maximum degree at most k that F-overlays a given hypergraph + H. In , we prove that for any graph F which is neither complete + nor anticomplete, there exists an integer np(F) such that (Δk)-F-Overlay is + NP-complete for all knp(F).

+
+ + Conflict coloring problems: complexity and application to high resolution biological assembly + modeling + + + F. + Cazals + + + D. + Mazauric + + + + In collaboration with F. Havet, T. V. H. Nguyen laboratoire I3S (CNRS, Université Côte d'Azur). + +

Given a graph G=(V,E), + a color set C(v) for each vertex v.V, a bipartite graph between + color sets C(u) and C(v) for every edge uvE, Conflict + Coloring consists in deciding whether exists a conflict coloring, that is a coloring in which c(u)c(v) + is not an edge of the bipartite graph. Conflict Coloring is motivated by computational + structural biology problems, high resolution determination of molecular assemblies. The graph represents the + subunits and the interaction between them, the colors are the given conformations, and the edges of the + bipartite graphs are the incompatible conformations of two subunits.

+

In this work, we first establish the complexity dichotomies (polynomial vs NP-complete) + for Conflict Coloring and its variants. We provide some experiments in which we build + instances of Conflict Coloring associated to Voronoi diagram in the + plane, and we then analyse the existences of a solution related to parameters used in our experimental + setup.

+
+
+
+ + Partnerships and cooperations + + + F. + Cazals + + + D. + Mazauric + + + + + International research visitors + + Visits of international scientists + + Inria International Chair + +
  • David Wales, Cambridge University, is endowed chair within 3IA Côte d'Azur / ABS.
  • +
    +
    +
    +
    +
    + + Dissemination + + Promoting scientific activities + + Scientific events: organisation +

    Frédéric Cazals was involved in the organization of:

    + +
  • Symposium Multidisciplinary approaches in cancer research, Organized at Inria Sophia + Antipolis Méditerranée. Web: .
  • +
  • Winter School Machine Learning Methods to Analyze and Predict Protein Structure, Dynamics + and Function, CIRM, Luminy, November 7-12, 2021. Web: .
  • +
  • + Critical evaluation of methods for scoring interfaces of protein complexes, Online Elixir + 3D-Bioinfo meeting, organized by Emmanuel Levy (Elixir IL), Frederic Cazals (Elixir FR), Shoshana Wodak + (Elixir BE).
  • +
    +
    + + Scientific events: selection + + Member of the conference program committees +

    Frédéric Cazals participated to the following program committees:

    + +
  • Symposium on Solid and Physical Modeling
  • +
  • Intelligent Systems for Molecular Biology (ISMB) / European Conference on Computational Biology + (ECCB)
  • +
    +
    +
    + + Invited talks +

    Frédéric Cazals gave the following invited talks:

    + +
  • + Mining protein flexibility: a new class of move sets; GDR BIM/GT MASIM, November 2021; + UCA, 5th Academy 4 Research Webinar - Mental Retardation and Protein Dynamics, October 2021.
  • +
    +
    + + Leadership within the scientific community +

    Frédéric Cazals

    + +
  • 2010-...: Member of the steering committee of the GDR Bioinformatique Moléculaire, for the Structure and + macro-molecular interactions theme.
  • +
  • 2017-...: Co-chair, with Yann Ponty, of the working group / groupe de travail (GT MASIM - Méthodes + Algorithmiques pour les Structures et Interactions Macromoléculaires), within the GDR de BIoinfor- matique + Moléculaire (GDR BIM, ).
  • +
    +
    + + Research administration +

    Frédéric Cazals

    + +
  • 2018-...: Member of the bureau du comité des équipes projets.
  • +
  • 2020-...: Member of the bureau of the EUR Life, Université Côte d’Azur.
  • +
    +

    Dorian Mazauric

    + +
  • 2019-...: Member of the comité Plateformes.
  • +
    +
    +
    + + Teaching - Supervision - Juries + + Teaching + +
  • 2014–...: Master Data Sciences Program (M2), Department of Applied Mathematics, Ecole Centrale-Supélec; + Foundations of Geometric Methods in Data Analysis; F. Cazals and M. Carrière, Inria + Sophia / (ABS, DataShape). Web: .
  • +
  • 2021–...: Master Data Sciences & Artificial Intelligence (M1), Université Côte d’Azur; Introduction to machine learning (course practicals); E. Sarti.
  • +
  • 2021–...: Master Data Sciences & Artificial Intelligence (M2), Université Côte d’Azur; Geometric and topological methods in machine learning; F. Cazals, J-D. Boissonnat and M. Carrière, + Inria Sophia / (ABS, DataShape, DataShape); Web: .
  • +
  • 2021–...: Master Cancérologie et Recherche Translationnelle (M2), Université Côte d’Azur; Binding affinity maturation and protein interaction network analysis: two examples of bioinformatics + applications in medicine; F. Cazals.
  • +
  • 2020–...: Master Sciences du Vivant (M2), parcours Biologie, Informatique, Mathématiques, Université Côte + d’Azur; Introduction to statistical physics of biomolecules; F. Cazals.
  • +
  • 2018–...: Master : Algorithmique et Complexité, 23h30 TD, niveau M1, Polytech Nice Sophia, Université Côte + d'Azur, filière Sciences Informatiques, France; Dorian Mazauric.
  • +
    +
    + + Supervision +

    PhD thesis:

    + +
  • + PhD in progress, 3rd year: Timothée O'Donnel, Modeling the influenza + polymerase. Université Côte d'Azur. Thesis co-supervised by Frédéric Cazals and Bernard Delmas, INRA + Jouy-en-Josas.
  • +
  • + Defended PhD: Thi Viet Ha Nguyen, Graph Algorithms techniques for (low + and high) resolution models of large protein assemblies. Université Côte d'Azur. Thesis co-supervised + by Frédéric Havet, Laboratoire I3S (CNRS, Université Côte d'Azur).
  • +
    +

    Interns:

    + +
  • Aarushi Gupta, intern from IIT Delhi, summer 2021. Modeling protein backbone flexibility + using solutions of the tripeptide loop closure. +
  • +
  • Louis Goldenberg, intern from Ecole Polytechnique, summer 2021. Parametric models for + compact clusters. +
  • +
  • Sebastián Gallardo Diaz, Universidad Técnica Federico Santa Marı́a, Valparaı́so, Chile. Advisors: Pierre + Kornprobst (Inria project-team Biovision), Dorian Mazauric. Algorithms for a new packing + problem : Towards Reading Accessible Newspapers.
  • +
  • Vivian Losciale, Université Côte d'Azur. Advisors: Jérémy Camponovo, Frédéric Havet, Buntheng Ly, Dorian + Mazauric, Maxime Sermesant.Jeux-vidéos de médiation : Intelligence artificielle pour + l’imagerie médicale.
  • +
  • Quentin Larose, Université Côte d'Azur. Advisors: Agnès Bessière, Carole Clastres, Jérémy Camponovo, Luc + Hogie, Dorian Mazauric, Eric Pascual, Sandrine Selosse, Brigitte Trousse. Portail des + ressources Terra Numerica.
  • +
    +
    + + Juries +

    Frédéric Cazals participated to the following committees:

    + +
  • Luke Dicks, Cambridge University, April 2021. Rapporteur for the PhD thesis K-means + landscapes: exploring clustering solution spaces using energy landscape theory. Advisor: David + Wales.
  • +
  • Manon Ruffini, Univ. of Toulouse, March 2021. Rapporteur on the PhD thesis Models and + Algorithms for Computational Protein Design. Advisor: Thomas Schiex.
  • +
  • Dorian Mazauric, Habilitation thesis, Université Côte d'Azur, November 2021. Committee member (president) + for the habilitation Algorithmique des graphes pour les réseaux et la biologie structurale + computationnelle.
  • +
    +

    Dorian Mazauric participated to the following committees:

    + +
  • Thi Viet Ha Nguyen, Université Côte d'Azur, December 2021. Committee member for the PhD thesis Graph Algorithms techniques for (low and high) resolution models of large protein + assemblies. Advisors: Frédéric Havet, Dorian Mazauric.
  • +
    +
    +
    + + Popularization + + Internal or external Inria responsibilities +

    Dorian Mazauric:

    + +
  • 2019–...: Head of Commission (Médiation et Animation des + MAthématiques, des Sciences et Techniques Informatiques et des Communications), Inria Sophia Antipolis - + Méditerranée.
  • +
  • 2019–...: Coordinator of , an ambitious scientific + popularisation project. Its main goal is to create a "Dedicated Digital space" in the south of France, (in + the spirit of the "Cité des Sciences" or "Palais de la découverte" in Paris). To do so, Terra Numerica is + developing and structuring popularisation activities, supports which are spread in different antennas + throughout the territory (e.g. Espace Terra Numerica - Valbonne Sophia Antipolis, MIA, in schools, + exhibition extensions...). This large-scale project involves (brings together) all the actors of research, + education, industry, associations and collectivities... It is actually composed of more than one hundred + people.
  • +
  • 2018–...: Member of the Conseil d'Administration de l'association les Petits Débrouillards.
  • +
  • 2017–...: Member of projet de médiation Galéjade : Graphes et ALgorithmes : Ensemble de Jeux À Destination + des Ecoliers... (mais pas que).
  • +
    +
    + + Articles and contents +

    Frédéric Cazals:

    + +
  • Podcast Investiga’Sciences Vive la protéine: interview-discussion of + Thomas Schiex and myself by Valérie Ravinet, October 2021. .
  • +
    +

    Dorian Mazauric:

    + +
  • Participation to the development of .
  • +
  • Participation to the development of popularization videos games .
  • +
    +
    + + Interventions +

    Dorian Mazauric - Fête de la Science 2021:

    + +
  • Village des Sciences de Villeneuve-Loubet Avec Pobot. Samedi 02 octobre 2021 et dimanche 03 octobre 2021. + With Thomas Dissaux, Adrien Gausseran, Nicolas Nisse, Eric Pascual, Lucas Picassari-Arrieta, Brigitte + Trousse.
  • +
  • Village des sciences de la vallée de la Vésubie avec Les Apprentis Pas Sages Samedi 02 octobre 2021. Puzzle du nid d’abeilles – Graphes et algorithmes grandeur nature. With Samantha + Lanney-Ricci, Magali Martin-Mazauric.
  • +
  • Interventions au Campus International de Valbonne Lundi 04 octobre 2021. La magie du binaire + – Pas besoin de réfléchir, les ordinateurs calculent tellement vite ? Problèmes actuels en + algorithmique. With Estelle Zavoli.
  • +
  • Atelier scientifique à l’Espace d’Art Concret (EAC), Mouans-Sartoux Organisé par l’EAC (Amandine Briand, + Sabrina Lah, Martin Merle, Claire Spada, Brigitte Segatori, Roubaud). Du lundi 04 octobre 2021 au vendredi + 08 octobre 2021. Des reines sur une oeuvre d’art (Tenth Copper Corner une oeuvre minimaliste + de Carl André formée de 55 carreaux) : mathématiques et algorithmique. With Frédéric Havet, Nicolas + Nisse, Martine Olivi. En collaboration avec Geoffroy Aubry et Valérie Doya (atelier de Physique).
  • +
  • Intervention au collège La chênaie de Mouans-Sartoux Mercredi 06 octobre 2021. La magie des + graphes et du binaire – Pas besoin de réfléchir, les ordinateurs calculent tellement vite ? Problèmes + actuels en algorithmique. With Mylène Raibaudi, Brigitte Trousse.
  • +
  • Village des Sciences de Mouans-Sartoux Samedi 09 octobre 2021. Sabrina Barnabé, Martine Olivi, Brigitte + Trousse, Thierry Viéville.
  • +
  • Festival des sciences de Nice d’Université Côte d’Azur Samedi 09 octobre 2021 et dimanche 10 octobre 2021. + With Alexandre Bonlarron, Foivos Fioravantes, Victor Jung, Hicham Lesfari, Steve Malalel, Magali + Martin-Mazauric, Romain Michelucci, Nicolas Nisse, Marie Pelleau, Nina Singlan, Rudan Xiao.
  • +
  • Interventions au collège de Roquebillière Jeudi 07 octobre 2021. La magie des graphes et du + binaire – Jeux combinatoires – Pas besoin de réfléchir, les ordinateurs calculent tellement vite ? + Problèmes actuels en algorithmique – Ateliers Jeux Graphes et Algorithmes. With Samantha + Lanney-Ricci.
  • +
  • Conférence à la médiathèque de Biot Vendredi 08 octobre 2021.
  • +
  • Village des Sciences et de l’Innovation de la CASA à Antibes Juan-les-Pins Avec PoBot, SLV, @b4games. + Samedi 16 octobre 2021 et dimanche 17 octobre 2021. With Agnès Bessière, Armel Berceliot, Étienne Chaplain, + Thomas Dissaux, Thierry Lespinasse, Stéphane Mansour, Magali Martin-Mazauric, Nicolas Nisse, Eric Pascual, + Lucas Picassari-Arrieta, Frédéric Rallo, Sandrine Selosse, Brigitte Trousse.
  • +
    +

    Dorian Mazauric - Interventions at Maison de l'Intelligence Artificielle:

    + +
  • Ateliers Terra Numerica avec les étudiants du Master SmartEdTech. Mercredi 14 avril 2021. Journée + intensive de formation hybride et animation et co-création d’ateliers. With Saint-Clair Lefevre, Frédéric + Havet, Margarida Romero, Thierry Viéville.
  • +
    +

    Dorian Mazauric - Cordées de la réussite (coordonné par Université Côte d'Azur):

    + +
  • Deux classes du collège Henri Nans, Aups. Les sciences du numérique à portée de mains ! Découvrir, + Explorer, Expérimenter ! Pirates et trésor : des maths et des algorithmes à la programmation + Scratch et mBot. With Frédéric Havet, Eric Pascual, Brigitte Trousse.
  • +
    +

    Dorian Mazauric - Programme Chiche:

    + +
  • Intervention au lycée Apollinaire, Nice Jeudi 14 octobre 2021.
  • +
  • Intervention au lycée Estienne d’Orves, Nice Jeudi 21 octobre 2021.
  • +
  • Intervention au CIV, Valbonne Sophia Antipolis Jeudi 02 décembre 2021.
  • +
    +

    Dorian Mazauric - Formations:

    + +
  • Formation d’enseignants co-organisée par la DANE et Terra Numerica avec les ateliers Terra Numerica à la + Maison de l’intelligence Artificielle. Mardi 9 mars 2021, mardi 23 mars 2021, mardi 6 avril 2021, mardi 20 + avril 2021, mardi 25 mai 2021. Machine d’apprentissage par renforcement pour gagner aux jeux, + Initiation à la reconnaissance d’images avec des drones, ateliers d’informatique débranchée. With + Jérémy Camponovo, Frédéric Havet, Eric Pascual, Brigitte Trousse.
  • +
  • Formation de personnels de médiathèques de la CASA. Vendredi 25 juin 2021, jeudi 23 septembre 2021. + Formation sur les fondements de l’informatique : Transmission de pensée – La magie du + binaire.
  • +
  • Présentation et formation au Fab’Ecole 06 de la DRANE, collège Bertone d’Antibes. Vendredi 26 novembre + 2021. Présentation et formation sur des ateliers Terra Numerica. With Brigitte Trousse.
  • +
    +

    Dorian Mazauric - In schools:

    + +
  • Collège Bechet d’Antibes Juan-les-Pins Lundi 8 mars 2021. Dans le cadre du projet pédagogique Ethique des + données et de l’information (1/3). Introductions aux algorithmes. With Sylvain Etienne, Frédéric Giroire, + Géraldine Rouard, Brigitte Trousse.
  • +
  • Centre International de Valbonne Sophia Antipolis Lundi 15 mars 2021. Dans le cadre de séances autour de + l’Intelligence Artificielle avec une classe de terminale du CIV organisées par Les + Petits Débrouillards. Intelligence Artificielle et reconnaissance d’images. With Marie Barbieux, Marine + Beaudet, Soledad Tolosa.
  • +
  • Collège Bechet d’Antibes Juan-les-Pins Vendredi 26 mars 2021 et 9 avril 2021. Dans le cadre du projet + pédagogique Ethique des données et de l’information (2/3). Modélisation d’un réseau social et + de contenus, et algorithmes de recommandation. With Sylvain Etienne, Frédéric Giroire, Géraldine + Rouard, Brigitte Trousse.
  • +
  • Collège Bechet d’Antibes Juan-les-Pins Lundi 7 juin 2021. Dans le cadre du projet pédagogique Ethique des + données et de l’information (3/3). Conférence Protection des données et métier de Déléguée à + la Protection des Données d’Inria (Anne Combe). With Anne Combe, Sylvain Etienne, Frédéric Giroire, + Géraldine Rouard, Brigitte Trousse.
  • +
  • Roquefort-les-Pins Dans le cadre des activités du centre aéré de la commune. Lundi 26 juillet 2021 et + mardi 27 juillet 2021. Trois demi-journées : ateliers d’informatique débranchée (pour les 3 à + 6 ans), ateliers pour découvrir les algorithmes de recommandation dans les réseaux sociaux (pour les + adolescents) et tours de magie pour découvrir comment l’ordinateur compte (pour les 6 à 10 ans). With + Frédéric Havet.
  • +
  • Lycée Internationale de Valbonne Jeudi 02 décembre 2021. Ateliers algorithmiques grandeur + nature. With Bérengère Abric, Perrine Le Dûs.
  • +
    +

    Dorian Mazauric - Internships:

    + +
  • Treize stagiaires de troisième au centre Inria d’Université Côte d’Azur Du lundi 13 décembre au vendredi + 17 décembre 2021.
  • +
    +
    +
    +
    + + + + + + Distributed Link Scheduling in Wireless Networks + + + V. + Vishal + Misra + + + P. + Philippe + Nain + + + + + Discrete Mathematics, Algorithms and Applications + + 2020 + 12 + 5 + 1-38 + + + + + + + + On the complexity of the representation of simplicial complexes by trees + + + J.-D. + Jean-Daniel + Boissonnat + + + + + Theoretical Computer Science + + February 2016 + 617 + 17 + + + + + + + Energy landscapes and persistent minima + + + J. + J. + Carr + + + D. + D. + Mazauric + + + F. + F. + Cazals + + + D. J. + D. J. + Wales + + + + + The Journal of Chemical Physics + + 2016 + 144 + 5 + 4 + + + + + + + Conformational Ensembles and Sampled Energy Landscapes: Analysis and Comparison + + + F. + F. + Cazals + + + T. + T. + Dreyfus + + + D. + D. + Mazauric + + + A. + A. + Roth + + + C. + C.H. + Robert + + + + + J. of Computational Chemistry + + 2015 + 36 + 16 + 1213--1231 + + + + + + + The Structural Bioinformatics Library: modeling in biomolecular science and beyond + + + F. + Frédéric + Cazals + + + T. + Tom + Dreyfus + + + + + + October 2016 + RR-8957 + + + + + + + Beyond Two-sample-tests: Localizing Data Discrepancies in High-dimensional Spaces + + + F. + Frédéric + Cazals + + + + + IEEE/ACM International Conference on Data Science and Advanced Analytics + IEEE/ACM International Conference on Data Science and Advanced Analytics + Paris, France + + March 2015 + 29 + + + + + + + Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees + + + F. + Frédéric + Cazals + + + + + NeurIPS 2019 - Thirty-third Conference on Neural Information Processing Systems + Vancouver, Canada + + December 2019 + + + + + + + + Comparing Two Clusterings Using Matchings between Clusters of Clusters + + + F. + Frédéric + Cazals + + + D. + Dorian + Mazauric + + + R. + Romain + Tetley + + + R. + Rémi + Watrigant + + + + + ACM Journal of Experimental Algorithmics + + December 2019 + 24 + 1 + 1-41 + + + + + + + + Complexity dichotomies for the Minimum F -Overlay problem + + + + Journal of Discrete Algorithms + + September 2018 + 52-53 + 133-142 + + + + + + + A Sequential Non-Parametric Multivariate Two-Sample Test + + + F. + Frédéric + Cazals + + + + + IEEE Transactions on Information Theory + + May 2018 + 64 + 5 + 3361-3370 + + + + + + + High Resolution Crystal Structures Leverage Protein Binding Affinity Predictions + + + S. + Simon + Marillet + + + F. + Frédéric + Cazals + + + + + + March 2015 + RR-8733 + + + + + + + + Novel Structural Parameters of Ig–Ag Complexes Yield a Quantitative Description of Interaction + Specificity and Binding Affinity + + + S. + Simon + Marillet + + + M.-P. + Marie-Paule + Lefranc + + + F. + Frédéric + Cazals + + + + + Frontiers in Immunology + + February 2017 + 8 + 34 + + + + + + + Hybridizing rapidly growing random trees and basin hopping yields an improved exploration of + energy landscapes + + + A. + A. + Roth + + + T. + T. + Dreyfus + + + C. + C.H. + Robert + + + F. + F. + Cazals + + + + + J. Comp. Chem. + + 2016 + 37 + 8 + 739--752 + + + + + + + + Studying dynamics without explicit dynamics: A structure‐based study of the export mechanism by + AcrB + + + I. + Isabelle + Mus‐Veteau + + + F. + Frédéric + Cazals + + + + + Proteins - Structure, Function and Bioinformatics + + September 2020 + + + + + + + Boosting the analysis of protein interfaces with Multiple Interface String Alignments: + illustration on the spikes of coronaviruses + + + S. + Stéphane + Bereux + + + B. + B + Delmas + + + F. + Frédéric + Cazals + + + + + Proteins - Structure, Function and Bioinformatics + + November 2021 + + + + + + + Improved polytope volume calculations based on Hamiltonian Monte Carlo with boundary + reflections and sweet arithmetics + + + A. + Augustin + Chevallier + + + F. + Frédéric + Cazals + + + + + Journal of Computational Geometry + + 2022 + + + + + + + Tripeptide loop closure: a detailed study of reconstructions based on Ramachandran + distributions + + + T. + T + O'donnell + + + C. H. + C H + Robert + + + F. + F + Cazals + + + + + Proteins - Structure, Function and Bioinformatics + + 2022 + + + + + + + Fréchet mean and <formula type="inline"><math xmlns="http://www.w3.org/1998/Math/MathML" + ><mi>p</mi></math></formula>-mean on the unit circle: decidability, algorithm, and applications to + clustering on the flat torus + + + F. + Frédéric + Cazals + + + B. + B + Delmas + + + T. + Timothee + O'donnell + + + + + SEA 2021 - 19th Symposium on Experimental Algorithms + Sophia Antipolis, France + + June 2021 + + + + + + + Graph Algorithm Techniques for Networks and Computational Structural Biology + + + + + November 2021 + + + + + + + Graph problems motivated by (low and high) resolution models of large protein + assemblies + + + V.-H. + Viet-Ha + Nguyen + + + + + + December 2021 + + + + + + + SARS-CoV-2 Through the Lens of Computational Biology:How bioinformatics is playing a key role + in the study of the virus and its origins + + + F. + Frédéric + Cazals + + + + + + March 2021 + 1-35 + + + + + + + On the complexity of overlaying a hypergraph with a graph with bounded maximum degree + + + F. + Frédéric + Havet + + + D. + Dorian + Mazauric + + + V.-H. + Viet-Ha + Nguyen + + + + + + 2021 + + + + + + + + Crystal structure of chloroplast fructose-1,6-bisphosphate aldolase from the green alga + Chlamydomonas reinhardtii + + + T. + Théo + Le Moigne + + + E. + Edoardo + Sarti + + + A. + Antonin + Nourisson + + + A. + Alessandra + Carbone + + + J. + Julien + Henri + + + + + + January 2022 + + + + + + + Gene prioritization based on random walks with restarts and absorbing states, to define gene + sets regulating drug pharmacodynamics from single-cell analyses + + + A. + Augusto + Sales-De-Queiroz + + + G. G. + Guilherme Guilherme + Sales Santa Cruz + + + A. + Alain + Jean-Marie + + + D. + Dorian + Mazauric + + + F. + Frédéric + Cazals + + + + + + November 2021 + + + + + + Molecular dynamics: survey of methods for simulating the activity of proteins + + + S. + S.A. + Adcock + + + A. + A.J. + McCammon + + + + + Chemical reviews + + 2006 + 106 + 5 + 1589--1615 + + + + + + The molecular architecture of the nuclear pore complex + + + F. + F. + Alber + + + S. + S. + Dokudovskaya + + + L. + L.M. + Veenhoff + + + W. + W. + Zhang + + + J. + J. + Kipper + + + D. + D. + Devos + + + A. + A. + Suprapto + + + O. + O. + Karni-Schmidt + + + R. + R. + Williams + + + B. + B.T. + Chait + + + A. + A. + Sali + + + M. + M.P. + Rout + + + + + Nature + + 2007 + 450 + 7170 + 695--701 + + + + + + Dynamics on statistical samples of potential energy surfaces + + + K. + K.D. + Ball + + + R. + R.S. + Berry + + + + + The Journal of chemical physics + + 1999 + 111 + 5 + 2060--2070 + + + + + + Thermodynamics and an Introduction to Thermostatistics + + + H. + H.B. + Callen + + + + + + 1985 + Wiley + + + + + + De novo design of picomolar SARS-CoV-2 miniprotein inhibitors + + + L. + L. + Cao + + + I. + I. + Goreshnik + + + B. + B. + Coventry + + + J. + J.B. + Case + + + L. + L. + Miller + + + L. + L. + Kozodoy + + + R. + R. + Chen + + + L. + L. + Carter + + + A. + A. + Walls + + + Y.-J. + Y-J. + Park + + + E.-M. + E-M + Strauch + + + L. + L. + Stewart + + + M. + M.S. + Diamond + + + D. + D. + Veesler + + + D. + D. + Baker + + + + + Science + + 2020 + 370 + 6515 + 426--431 + + + + + + + Energy landscapes and persistent minima + + + J. + J. + Carr + + + D. + D. + Mazauric + + + F. + F. + Cazals + + + D. J. + D. J. + Wales + + + + + The Journal of Chemical Physics + + 2016 + 144 + 5 + 4 + + + + + + A practical volume algorithm + + + B. + B. + Cousins + + + S. + S. + Vempala + + + + + Mathematical Programming Computation + + 2016 + 8 + 2 + 133--160 + + + + + + Understanding molecular simulation + + + D. + D. + Frenkel + + + B. + B. + Smit + + + + + + 2002 + Academic Press + + + + + + Random walks and an <formula type="inline"><math xmlns="http://www.w3.org/1998/Math/MathML" + ><mrow><msup><mi>O</mi> + <mo>*</mo> + </msup><mrow><mo>(</mo></mrow><msup><mi>n</mi> + <mn>5</mn> + </msup></mrow></math></formula>) volume algorithm for convex bodies + + + R. + R. + Kannan + + + L. + L. + Lovász + + + M. + M. + Simonovits + + + + + Random Structures & Algorithms + + 1997 + 11 + 1 + 1--50 + + + + + + A guide to Monte Carlo simulations in statistical physics + + + D. + D. + Landau + + + K. + K. + Binder + + + + + + 2014 + Cambridge university press + + + + + + Free energy computations: A mathematical perspective + + + T. + T. + Lelièvre + + + G. + G. + Stoltz + + + M. + M. + Rousset + + + + + + 2010 + World Scientific + + + + + + Prediction, determination and validation of phase diagrams via the global study of energy + landscapes + + + C. + C. + Schön + + + M. + M. + Jansen + + + + + Int. J. of Materials Research + + 2009 + 100 + 2 + 135 + + + + + + Improved protein structure prediction using potentials from deep learning + + + A. + A. + Senior + + + R. + R. + Evans + + + J. + J. + Jumper + + + J. + J. + Kirkpatrick + + + L. + L. + Sifre + + + T. + T. + Green + + + C. + C. + Qin + + + A. + A. + Żídek + + + A. + A. + Nelson + + + A. + A. + Bridgland + + + H. + H. + Penedones + + + S. + S. + Petersen + + + K. + K. + Simonyan + + + S. + S. + Crossan + + + K. + K. + Pushmeet + + + D. + D. + Jones + + + D. + D. + Silver + + + K. + K. + Kavukcuoglu + + + D. + D. + Hassabis + + + + + Nature + + 2020 + 1--5 + + + + + + Atomic-level characterization of the structural dynamics of proteins. + + + D. E. + D. E. + Shaw + + + P. + P. + Maragakis + + + K. + K. + Lindorff-Larsen + + + S. + S. + Piana + + + R. O. + R. O. + Dror + + + M. P. + M. P. + Eastwood + + + J. A. + J. A. + Bank + + + J. M. + J. M. + Jumper + + + J. K. + J. K. + Salmon + + + Y. + Y. + Shan + + + W. + W. + Wriggers + + + + + Science + + 2010 + 330 + 6002 + 341--346 + + + + + + Energy Landscapes + + + D. J. + D. J. + Wales + + + + + + 2003 + Cambridge University Press + + + + + + Building force fields: an automatic, systematic, and reproducible approach + + + L.-P. + Lee-Ping + Wang + + + T. J. + Todd J + Martinez + + + V. S. + Vijay S + Pande + + + + + The journal of physical chemistry letters + + 2014 + 5 + 11 + 1885--1891 + + + + +
    diff --git a/RADAR2TEI/XSLT/RADAR2TEI.xsl b/RADAR2TEI/XSLT/RADAR2TEI.xsl new file mode 100644 index 0000000..5b46b31 --- /dev/null +++ b/RADAR2TEI/XSLT/RADAR2TEI.xsl @@ -0,0 +1,369 @@ + + + + + + + + href="../schema-TEI-RADAR/out/TEI_RADAR.rng" type="application/xml" + schematypens="http://relaxng.org/ns/structure/1.0" + href="../schema-TEI-RADAR/out/TEI_RADAR.rng" type="application/xml" + schematypens="http://purl.oclc.org/dsdl/schematron" + + + + + + + + + Title + + +

    Publication Information

    +
    + +

    Information about the source

    +
    +
    + + + + + +
    + + + + + + + + + + + +
    +
    + + + + + + + + + + + + + + + + +
    + + +
    +
    + + +
    + + + + + + + + + + +
    +
    + + + +
    + + + +
    +
    + + + +
    + + + + {@titre} + + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + hdr + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +