SUAM stands for Speeding Up data Analysis for Mexico with artificial intelligence and distributed computing. Is a project based on a previous work done and with results being evaluated by a Scientific Committee.
Concerned about the current situation about the SARS-CoV-2 in Mexico, a group of young, enthusiasts and very capable people decided to start this project.
The main idea, that is to say, bring some technologies in just one, and also the architecture idea, is based upon an article series: https://scholar.google.com/citations?user=wpPYUQUAAAAJ&hl=en
The SUAM project combines some of the best software (to our knowledge), tools and related technologies in the next fields:
- Bioinformatics.
- Machine learning (ML). Focused on classification and clusterization.
- Deep learning (DL).
- Distributed computing (DC).
- For bioinformatics:
- For ML/DL:
- For DC:
The project folders are:
bio
. Contains the bioinformatics framework for sequences alignments and it would be intended for future use to, among others: molecular mechanics.cl
. Stands for classifiers; related to classification problems in ML.dl
. Framework for DL. Actually supporting: Keras, PyTorch and Scikit-learn.parsers
. Defines theJSONParser
class in its__init__
file. This class is responsible to parse the main JSON configuration file where the tools (for bioinformatics, classification and deep learning) can be specified, and also their parameters.runners
andtests
. These folders could be deprecated in future versions (note that each folder -i.e.bio
,cl
anddl
folders-, as required, contains its own folders).
In the bio
, cl
and dl
folders you will find, among others, the next two main files:
cfg.json
. Contains the configuration for each tool supported in the project.requirements.prod
. The Python required modules (remember: runpip install -r requirements.prod
before anything else) for each case (bioinformatics, classifier and deep learning).
The bio
folder and its tests is the most advanced in comparison to ML/DL, and this is deliberate, because, in comparison to the latter, their tools (Clustal Omega and MUSCLE) are not totally tighted to Python.
So, in order to reduce the required time and efforts you will find, in the bio
folder, the scripts
folder, where, among others, you will see the install.sh
file. Please run this file to be able to execute Clustal Omega and MUSCLE, which are dependencies for the SUAM's bioinformatics framework.
Finally, as we said, the bio
folder contains tests
and their results
(in its named folders).
- Nowadays we are working with a new (second) paper with the first results from our experiments and we expect to release the architecture first version in these days.
- Build and run the tests sets for ML/DL.
- Analyse the results of the step above.
- Start a new article.
We're looking for young, enthusiasts and very capable people, software engineers, data scientists, computing specialists, information technologies(IT) specialists, on the levels: student (more than the 50% from curricula approved) and/or engineer.
If you believe that can help on this please write to: [[email protected]](mailto:[email protected]?subject=SUAM Colaborator "[email protected]")
Not actually looking for money, right now we're looking devices to build a devices cluster for the data processing. Do you have an old machine and you can't sell it? Do you have an old machine and do you want to dispose it? Don't sell it, don't dispose it, donate it for the project.
If you believe that can help on this please write to: [[email protected]](mailto:[email protected]?subject=SUAM Sponsor "[email protected]")
Are you a small-medium size organization? Would you like that your company logo appear in the project's site, or its derivated works?
If you believe that can help on this please write to: [[email protected]](mailto:[email protected]?subject=SUAM Supporter "[email protected]")