Технологии / Интернет
Mariia Smirnova

Mariia Smirnova

Machine Learning Engineer & NLP Data Scientist
Moscow, Москва


О Mariia Smirnova:

Currently in Moscow, Russia. 
Open to opportunities in Europe, UK, UAE. Ready for business trips and relocation.

General knowledge, skills & experience:
- Python 3, OOP;
- Git, DVC;
- Docker;
- machine learning models' training implementation via Kubeflow pipelines (`kfp` library);
- deep learning models architecture development via Keras, Pytorch, Pytorch-lightning libraries;
- ML models training via sklearn, catboost;
- text data analysis and preprocessing via NLTK, Pymorphy, FastText, Word2Vec, Pyenchant, eli5;
- NER rules and grammars development via Yargy library;
- graph modeling and analysis via NetworkX, Pyvis;
- data gathering via SQL (using joins, aggregation and window functions when necessary), MapReduce queries optimization, working experience with Microsoft SQL Server, MS Access, HDFS + Hive, Clickhouse;
- data gathering via Pyspark.

Advanced in English (approved by BEC Higher Cambridge certificate with C1 level)

Опыт работы

  1. Machine Learning Engineer @ LLC VKontakte (vk.com social network)
    May 2021-Now
    Working for the Vkontakte (vk.com) social network's classified advertising service and the market service in the ML development team. 
    The classified advertising service provides the social network users with functional to post classifieds to sell products, communicate to the sellers of other goods as well as to buy them directly. The market service implements all the same opportunities but mostly aimed at business companies neither ordinary people who act as a seller.

    Tasks and experience:
    1. Developed the python library for ML team members from the ground up that is aimed at ML tasks solving including:
    - NLP data processing (text cleaning, text normalization etc.);
    - ML models training (model classes, model inference classes, custom losses, custom Dataset and Collator classes, etc.);
    - functional for working with storage systems. 
    The library's ML functional is based on `Pytorch` and `Pytorch-lightning`.
    2. Developed the ML pipeline for CLIP model additional training on services' data.
    3. Trained and deployed ML models that solve the following tasks:
    - item's name recognition in the ad text (`Fasttext` as embedding, self-attention and LSTM as a model's core);
    - post classification (byte-pair encoding, attention and ML perceptron);
    - posts and goods classification (`Optuna` parameters tuning + `Catboost`);
    - binary classification for ordinary and business users' accounts (`Optuna` parameters tuning + `Catboost`).
    4. Conducted the analytical research on competitors' markets category trees using graph theory methods and centrality metrics via `Networkx` and `Pyvis`
    5. Developed and implemented the Kubeflow pipeline process (via `Kfp` liabrary) for ML models' data processing, training, inference testing and checkpoint uploading.
    6. Had a speech on the meeting for the developers and analytics on the topic "ML models' interpretability"
  2. Machine Learning Engineer @ PJSC Sberbank
    November 2019 — May 2021
    Immersed into NLP data science team in the development of the call center operator’s automated workplace. The service processes client-operator phone conversation through speech-to-text and NLP technologies in real-time mode and extracts structured information from their dialogue that is used to guide operator through the clients’ consultation scenario.
    Tasks and experience:
    1. Development of the intent classification service using ML ensemble architecture:
    - operator intent classification model: LinearSVC with the usage of TF-IDF vectorizer embedding, GridSearchCV for parameters tuning;
    - client intent classification model: the perceptron neural network (developed with Keras) with the usage of TF- IDF vectorizer embedding and the predicted operator intent used as a context for the client's answer.
    Main metric: F1-score macro.
    Used Tensorboard for the model's training monitoring.
    During the service development and the models' retraining data and errors analysis has been performed that lead to the ~5% increase in metric value.
    2. Development of the intent classification service using Bi-LSTM architecture (developed with Pytorch).
    The architecture improves classification results due to the model’s ability to consider all the previous phrases within a dialogue as a context for the next phrase;
    3. Development of the library for domain specific named entities recognition via Yargy:
    - client-related entities: full name, birthdate, phone number, full address (country, area, settlement, street, house number, flat number, floor number, intercom code) etc.;
    - product-related entities: cards types, card names, operation currencies, transaction amounts, transaction limits, tariffs, operation regions etc.
    4. Development of the monitoring service for NLP pipeline performance metrics (models' speed, errors) computation:
    - exporting service (Prometheus) ; 
    - Grafana dashboard.


  1. Financial University under the Government of the Russian Federation, 2014 – 2018
    The faculty of applied mathematics and information technologies Business informatics department

    BSc, honors degree in Business Informatics 
    GPA: 4,75/5
  2. Higher School of Economics, 2018-2020
    Business informatics department
    MSc, honors degree in Big Data Systems
    The education programme has been entirely in English as well as all the home works, course papers and master's thesis.
    GPA: 8,75/10

Профессионалы, конкурирующие с Mariia

Профессионалы из того же сектора Технологии / Интернет, что и Mariia Smirnova

Профессионалы из разных отраслей рядом Moscow, Москва

Другие пользователи, которых зовут Mariia

Вакансии рядом с Moscow, Москва

  • ГВКР - P.arinform@gmail.com Kapotnya Удаленная работа с географическими ограничениями FREELANCE 21.000 ₽ - 43.000 ₽ бюджет

    Диcтaнциoнная пoдpaбoткa. · Кoppeктyрa тeкcтa, cвepкa пocлe внeceния пpaвoк. · Рeдaктиpoвaниe aвтopcких мaтepиaлoв. · Haбop инфopмaции c пpeдocтaвлeнных гpaфических фaйлoв, без дoпoлнительнoгo peдaктиpoвания. · Внимaтeльнocть, oтвeтcтвeннocть, ycидчивocть. · Увepeнный пoльзoвaтел ...

  • Findtutors

    Biology Tutor in Moscow

    Найдено: TusMedia UK - 5 дней назад

    Findtutors Moscow, United Kingdom

    At Findtutors We are searching for an innovative and energetic private tutor to join our excellent team of UK tutors. We're looking for a qualified teacher to join our team of professional tutors that support our students at all stages of their education to help them with Biology ...

  • ГВКР - P.arinform@gmail.com Kapotnya Удаленная работа с географическими ограничениями FREELANCE 21.000 ₽ - 43.000 ₽ бюджет

    Пpoвoдим нaбop дистaнциoнных сoтpyдникoв. · Нaбop инфopмaции с гpaфических фaйлoв, без дoпoлнительнoгo pедaктиpoвания. · Pедaктиpoвание aвтopских матеpиaлoв. · Кoppектypa, свеpкa пoсле внесения прaвoк. · Oбpабoтка и набop докyментoв. · Грaмoтнocть, внимaтeльнocть, oтвeтcтвeннocть ...