Announcing: Internships at Stardog Labs

Canonical URL

Do not index

At Stardog, we’re at the forefront of transforming how organizations harness the power of data through Knowledge Graphs (KGs) and Large Language Models (LLMs). We’re excited to announce an opportunity for student interns to join our innovative team, where you can make a tangible impact on the future of data integration and AI.

The Stardog internship program combines real-world problems with state of the art research questions in AI. During the internship at Stardog, you will engage in hands-on projects that involve building prototypes. You will collaborate with experienced professionals and contribute to developing solutions used by the largest organizations in the world. This is a unique chance to build your skills in a growing field while contributing to the development of impactful technologies.

Suggested Internship Topics

A few suggested internship topics are provided below to guide and inspire students to choose the area they want to work on. If the topic you are interested in is not included below we are open to a discussion and hear your ideas.

Reasoning with LLMs & GNNs over KGs

Reasoning over Knowledge Graphs (KGs) has historically focused on symbolic logic-based methods but more recently numerical and neural methods utilizing LLMs and GNNs have emerged. We aim to push the boundaries of current techniques, focusing on enhancing the ability of KGs to handle dynamic, real-world data while enabling LLMs and GNNs to efficiently reason over structured information. Key topics include developing advanced embedding models to improve KG completion, creating universal reasoning frameworks through prompt-based learning, and introducing efficient algorithms that streamline the integration of graph structures with LLMs.

Reference papers:

Inference over Unseen Entities, Relations and Literals on Knowledge Graphs

Start from Zero: Triple Set Prediction for Automatic Knowledge Graph Completion

A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning

LLaGA: Large Language and Graph Assistant

Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph

Struct-X: Enhancing Large Language Models Reasoning with Structured Data

Query Generation with LLMs

Creating structured queries (SQL or SPARQL) from natural language questions enable non-technical users to interact with complex databases without needing to learn a query syntax. Stardog Voicebox uses this approach to provide hallucination-free answers for user questions.

Reference papers:

SPARQL Generation with Entity Pre-trained GPT for KG Question Answering

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

SPARKLE: Enhancing SPARQL Generation with Direct KG Integration in Decoding

Knowledge Graph Construction

Structured query generation cannot utilize the information contained in unstructured documents. Knowledge Graph Construction aims to augment KGs with the data extracted from unstructured documents. Safety RAG approach adopted in Stardog accomplishes this while providing more safety controls for hallucinations. However, there are still outstanding challenges with respect to performance and accuracy especially over specialized domains. Our goal is to improve the current KG construction approaches to be more flexible, customizable, and performant.

Reference papers:

Finetuning Generative Large Language Models with Discrimination Instructions for Knowledge Graph Completion

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

Aligning Multiple Knowledge Graphs in a Single Pass

Knowledge Graph Generation From Text

Model Serving, Inference, Attention variants

The capabilities exhibited by LLMs improve as the size of the model increases but the high computational and memory requirements for serving LLMs limit the applicability of larger models in practice. As a result improving the latency and the throughput of LLM inference has garnered a lot of attention recently. Optimization approaches range from utilizing Key-Value (KV) caches more effectively to using different speculative decoding strategies to completely new attention algorithms or new tensor management architectures. We are interested in researching these approaches and utilizing them to improve Stardog Voicebox performance.

Reference papers:

Hydragen: High-Throughput LLM Inference with Shared Prefixes

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Accelerating LLM Inference with Staged Speculative Decoding

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Graph-Structured Speculative Decoding

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

Differential Transformer

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

Graph Query Engine

Stardog’s Graph Query Engine is a core component of the Stardog platform that many other features, such as Voicebox, depend on. Its performance and robustness are essential. Many improvements done over the years are based on database research and experience from other database products.

General query engine improvements

Fuzzing for SPARQL correctness. Fuzzy testing is a very powerful method for randomized testing, e.g. in the SQL context. We are considering either investigating how the popular SQLancer approach can be used for SPARQL or developing Stardog-specific techniques for SPARQL fuzzing.

Query planning and optimization

Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks. Accurate cardinality estimations are critically important for cost-based query optimization, particularly, join ordering. The paper reported on successfully using modern ML for estimating cardinality in knowledge graphs.

Wander Join and XDB: Online Aggregation via Random Walks. Interesting sampling approach to approximate joins and cardinality estimations

Common sub-expression elimination. A very common optimization in SQL systems that is currently missing in Stardog. It needs to be adapted to identify repeated graph query patterns, so that they are only evaluated once and the results are reused.

Query execution

Encapsulation of parallelism in the Volcano query processing system. All Stardog query execution is currently single-threaded. The paper describes specific operators that can adapt the pull-based execution model used in Stardog (so-called Volcano model) to support intra-query parallelism.

Factorized (intermediate) query results (theory, see also a more practical application in graph databases). Compact representation of intermediate results helps, in certain scenarios, to avoid Cartesian explosions and dramatically improve performance.

To Learn More or to Apply

Send email to ai-internship@stardog.com and include the following:

updated CV

cover letter stating areas of interest (including new things we don’t mention above)

time period of availability (starting & ending dates)

your work location for the internship (remote preferred)

Announcing: Internships at Stardog Labs

Suggested Internship Topics

Reasoning with LLMs & GNNs over KGs

Query Generation with LLMs

Knowledge Graph Construction

Model Serving, Inference, Attention variants

Graph Query Engine

To Learn More or to Apply

Stardog Voicebox is a fast, accurate AI Data Assistant that's 100% hallucination-free guaranteed.

Stay up-to-date with Labs innovations