Votre alerte-emploi vient d’être créée avec succès.
Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM)
PhD Position in Data Linking, Semantic Web, Artificial Intelligence and Machine Learning
Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM)
LIRMM was created in 1992, the result of the merger of the CRIM and LAM laboratories, under the leadership of the CNRS and the University of Montpellier II.
Visitez la page de l'employeur
DÉTAILS DE L'OFFRE
Publié: il y a 10 jours
Date limite d'inscription: Non Spécifiée
Localisation: Montpellier, France
Vous devez vous connecter ou créer un compte pour enregistrer cette offre
Veuillez mentionner que vous avez trouvé ce poste sur Academic Positions lors de votre candidature.
PARTAGER CE POSTE

PhD Position in Data Linking, Semantic Web, Artificial Intelligence and Machine Learning

Title

Complementary datasets linking by means of knowledge graph augmentation and multimodal embeddings

Information

Employer University of Montpellier

School Doctorate School I2S, PhD in Informatics

When Fall 2019

Duration 36 months

Where LIRMM, Montpellier, France

Collaboration ANR Project D2KAB (www.d2kab.org), Project AgroLD (www.agrold.org)

Keywords

Semantic web, data linking, AI, Machine learning, Embeddings, Text and data mining.

Technologies

Machine learning (Scikit Learn, Glove, word2vec, etc.), Semantic web technologies (OWL, RDF, SPARQL, triplestore, Linked data)

Abstract

Web data linking is defined as the problem of integrating heterogeneous datasets structured as knowledge graphs. This thesis aims to build on current solutions and go beyond state-of-the-art by proposing methods to interlink complementary datasets. Two datasets are said to be complementary if the entities that they contain are described by vastly non-intersecting sets of properties. We will develop methodological solutions and tools based on (1) knowledge graph augmentation techniques and (2) combined multimodal (text, graphs) embeddings of entities. The new methods, as generic as possible, will be experimented in the context of the AgroLD knowledge base (www.agrold.org), which brings together a large number of agronomy datasets. Work will be carried out in close collaboration with the partners of the ANR D2KAB project (www.d2kab.org).

Detailed description

Linking (or interconnecting) data is an active research domain that aims to establish semantic links between entities described in different datasets. We are interested here in data represented as RDF (Resource Description Framework) knowledge graphs, published on the web as part of the collaborative Linked Open Data initiative, which today hosts more than 1100 datasets. The semantic links that we seek to establish are those of identity, expressed by the "owl: sameAs" property of the vocabulary OWL (Web Ontology Language). The difficulty comes from the high heterogeneity of the descriptions of entities that can be found in different graphs [1,2]. The majority of existing linking tools are based on the assumption that for each pair of potentially matching entities, there is a common subset of properties that will help infer the identity link (or the lack thereof) [1]. However, in a number of real-world cases, this intersection is very weak or non-existent – we are then talking of complementary datasets. In particular, we are interested in agronomic data issued from the AgroLD knowledge base [5] (www.agrold.org), which show this problem.

The question arises then where to look for information to compare entities in different datasets. In a number of cases this information is present in the graphs, but in an unstructured form (e.g., in textual comments or annotations). Text specific knowledge extraction methods may then be applied to extract this information. For example, a particularity of AgroLD's data is that most of them contain text fields that are not described using standardized terminologies or ontologies. As a result, the discoveries that could be made by searching these resources are limited. We will therefore be interested in the automatic extraction of entities of interest and relations from these textual fields in order to structure and render usable the information contained therein [3,4]. For this, the AgroPortal Annotator will be a possible tool to use [8,9].

On the other hand, a number of knowledge base augmentation approaches exist, which make it possible to complete automatically the missing knowledge by using background knowledge  information contained in other established knowledge graphs (DBpedia, Wikidata, etc.). We propose here to use and adapt these methods for the particular task of linking complementary datasets by automatically increasing the knowledge in these datasets to allow comparison. In addition, we will use textual data (including scientific articles) for the knowledge-building task. We will tackle the question of the definition and application of lexical embeddings of entities of different modalities (text, graph, social networks) [6,7] that will allow the semantic comparison of these entities.

Tasks to accomplish:

  • To establish a detailed state of the art on web-data linking, automatic knowledge base augmentation and entity/graph embeddings;
  • To propose a baseline method of linking complementary datasets using knowledge augmentation methods and methods for extracting named entities from the text;
  • To exploit different modalities (texts, graphs, social networks) to define and apply embeddings of entities to discover their semantic similarity;
  • To evaluate and apply these methods on real agronomic data and other reference benchmarks.

Expected profile

We are looking for a motivated junior researcher with experience in machine learning and semantic web technologies. The candidate will demonstrate aptitudes or matches with most of the following aspects:

- High motivation for scientific research

- Knowledge of semantic web technologies, especially JSON/RDF/SPARQL.

- Experience with machine learning tools (e.g., Python’s Scikit Learn)

- Knowledge of text and data mining techniques (named entity recognition)

- Excellent technical skills to conduct experiments with real-world and benchmark data

- Perfect English oral and writing skills

- Autonomy and initiative, take on technical decisions within the project and justify choices

- Basic knowledge of French with objective to learn the language during the contract

- Excellent writing skills as reports, documentation, and technical notes will always be necessary

Application

Application for this position will EXCLUSIVELY BY ACCEPTED via the following platform:

https://www.indeed.fr/emploi/phd-position-data-linking-semantic-web-ai-machine-learning-6ad127de4b6dc118

Documents required are (include everything in one single PDF file):

- a curriculum vitae describing your education and experience;

- a motivation letter describing your interest in the position and the matches with the expected profile;

- link to your master thesis or a relevant related publications;

- copies of your transcripts of records (master, bachelor);

- names and contact details of referees.

No application by email will be accepted, but for more information about this position, please contact Konstantin Todorov (todorov@lirmm.fr) and Clement Jonquet (jonquet@lirmm.fr). Please avoid attached documents and include links if you would like to send a document.

Remote and face to face interviews will be organized.

Contract

The successful candidate will hold a scholarship from the French ministry of Higher Education Research and Innovation (1600€/month) for a three years period of time. Social security and benefits are included. Possibility to complement with teaching activities.

References

[1] Achichi, M., Bellahsene, Z., Ben Ellefi, M.,Todorov, K. (2019, in print) Linking and Disambiguating Entities Across Heterogeneous RDF Graphs. Journal of Web Semantics.

[2] Manel Achichi, Zohra Bellahsene, Konstantin Todorov:  A survey on web data linking. Ingénierie des Systèmes d'Information (ISI) 21(5-6): 11-29 (2016)        

[3] Rafael Vieira and Kate Revoredo. Using Word Semantics on Entity Names for Correspondence Set Generation. OAEI 2017 challenge.

[4] Yuanzhe Zhang, Xuepeng Wang, Siwei Lai, Shizhu He, Kang Liu, Jun Zhao, and Xueqiang Lv. Ontology Matching with Word Embeddings. CNC, CCL 2014. LNCS, vol. 8801

[5] Aravind Venkatesan, Gildas Tagny Ngompe, Nordine El Hassouni, Imene Chentli, Valentin Guignon, Clement Jonquet, Manuel Ruiz, Pierre Larmande. Agronomic Linked Data (AgroLD): a Knowledge-based System to Enable Integrative Biology in Agronomy. 2018. Plos One.

[6] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J.. "Distributed representations of words and phrases and their compositionality." In ANIPS 2013.

[7] Ristoski, P., and Heiko P.. "Rdf2vec: Rdf graph embeddings for data mining." In ISWC, 2016.

[8] Jonquet, C., Toulet, A., Arnaud, E., Aubin, S., Yeumo, E. D., Emonet, V., ... & Larmande, P. (2018). AgroPortal: A vocabulary and ontology repository for agronomy. Computers and Electronics in Agriculture, 144, 126-143.

[9] Tchechmedjiev, A., Abdaoui, A., Emonet, V., Zevio, S., & Jonquet, C. (2018). SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes. BMC bioinformatics, 19(1), 405.

Lire la suite
Vous devez vous connecter ou créer un compte pour enregistrer cette offre

S'ABONNER A DES ANNONCES COMME CELLES-CI

  • Types d'emploi

4 POSTES PROPOSÉS PAR CET EMPLOYEUR

Research Engineer/Developer: Semantic Web – Ontology-based Services
Information Employer: University of Montpellier Context: ANR D2KAB (www.d2kab.org) and ANR PractiKPharma (http://practikpharma.loria.fr) projects When: April 2019 – for 24 months (other extensions might be possible) Where: LIRMM, Montpellier,...
Semantic Web Postdoc/Researcher Position: Ontology Management and Alignment
Title Semantic Web postdoc/researcher position: Ontology management and alignment in agronomy & biodiversity Information Employer: University of Montpellier Context: ANR Project D2KAB (www.d2kab.org) and AgroPortal project (http://agroportal.lirmm.fr) When: April...
PhD Position in Semantic Web Area: Ontology Alignment and Property Graphs
Title Solving some ontology alignment issues in agronomy and biodiversity by building and leveraging ontology-based background resources based on property graphs Information EmployerUniversity of Montpellier SchoolDoctorate School I2S, PhD in Informatics When-Fall...
Open Post-doc Position at LIRMM (University of Montpellier/CNRS)
in collaboration with NXP Semiconductors “Low cost testing of IoT transceivers - Sourcing an RF signal with digital tester channels” Location: LIRMM, University of Montpellier/CNRS, France Funding: HADES European Project (Penta) Duration: 9 months, with possible...

CAREER ADVICE

[[excerpt]]

By [[ author ]]
Posted [[ publishingDate ]]

[[title]] ([[ nbHits ]])

[[title]] ([[ nbHits ]])

[[title]] ([[ nbHits ]])

No results found for

Search tip

  • Check for spelling mistakes
  • Reduce the number of keywords used or try using a broader search phrase

[[title]] ([[ nbHits ]])

[[title]] ([[ nbHits ]])

[[ name ]]
[[ name ]]
[[ excerpt ]]
[[ city ]], [[ region ]] [[ availableJobsBlock ]]
Featured Featured employer
Short list
Développer la liste
[[title]]
[[ employerName ]]
[[ employerName ]]
[[ favoriteBtnText ]]
Vous devez vous connecter ou créer un compte pour enregistrer cette offre
[[ locationCity ]], [[ locationCountry ]]  ·  Publié [[ publishingDateTimestamp ]]  ·  [[ applicationDeadlineDateTimestamp ]]
[[ title ]]
Publié [[ publishingDateTimestamp ]] [[ applicationDeadlineDateTimestamp ]]