Canonical URL
Do not index
Do not index
At Stardog, we’re at the forefront of transforming how organizations harness the power of data through Knowledge Graphs (KGs) and Large Language Models (LLMs). We’re excited to announce an opportunity for student interns to join our innovative team, where you can make a tangible impact on the future of data integration and AI.
The Stardog internship program combines real-world problems with state of the art research questions in AI. During the internship at Stardog, you will engage in hands-on projects that involve building prototypes. You will collaborate with experienced professionals and contribute to developing solutions used by the largest organizations in the world. This is a unique chance to build your skills in a growing field while contributing to the development of impactful technologies.
Suggested Internship Topics
A few suggested internship topics are provided below to guide and inspire students to choose the area they want to work on. If the topic you are interested in is not included below we are open to a discussion and hear your ideas.
Reasoning with LLMs & GNNs over KGs
Reasoning over Knowledge Graphs (KGs) has historically focused on symbolic logic-based methods but more recently numerical and neural methods utilizing LLMs and GNNs have emerged. We aim to push the boundaries of current techniques, focusing on enhancing the ability of KGs to handle dynamic, real-world data while enabling LLMs and GNNs to efficiently reason over structured information. Key topics include developing advanced embedding models to improve KG completion, creating universal reasoning frameworks through prompt-based learning, and introducing efficient algorithms that streamline the integration of graph structures with LLMs.
Reference papers:
Query Generation with LLMs
Creating structured queries (SQL or SPARQL) from natural language questions enable non-technical users to interact with complex databases without needing to learn a query syntax. Stardog Voicebox uses this approach to provide hallucination-free answers for user questions.
Reference papers:
Knowledge Graph Construction
Structured query generation cannot utilize the information contained in unstructured documents. Knowledge Graph Construction aims to augment KGs with the data extracted from unstructured documents. Safety RAG approach adopted in Stardog accomplishes this while providing more safety controls for hallucinations. However, there are still outstanding challenges with respect to performance and accuracy especially over specialized domains. Our goal is to improve the current KG construction approaches to be more flexible, customizable, and performant.
Reference papers:
Model Serving, Inference, Attention variants
The capabilities exhibited by LLMs improve as the size of the model increases but the high computational and memory requirements for serving LLMs limit the applicability of larger models in practice. As a result improving the latency and the throughput of LLM inference has garnered a lot of attention recently. Optimization approaches range from utilizing Key-Value (KV) caches more effectively to using different speculative decoding strategies to completely new attention algorithms or new tensor management architectures. We are interested in researching these approaches and utilizing them to improve Stardog Voicebox performance.
Reference papers:
Graph Query Engine
Stardog’s Graph Query Engine is a core component of the Stardog platform that many other features, such as Voicebox, depend on. Its performance and robustness are essential. Many improvements done over the years are based on database research and experience from other database products.
General query engine improvements
- Fuzzing for SPARQL correctness. Fuzzy testing is a very powerful method for randomized testing, e.g. in the SQL context. We are considering either investigating how the popular SQLancer approach can be used for SPARQL or developing Stardog-specific techniques for SPARQL fuzzing.
Query planning and optimization
- Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks. Accurate cardinality estimations are critically important for cost-based query optimization, particularly, join ordering. The paper reported on successfully using modern ML for estimating cardinality in knowledge graphs.
- Wander Join and XDB: Online Aggregation via Random Walks. Interesting sampling approach to approximate joins and cardinality estimations
- Common sub-expression elimination. A very common optimization in SQL systems that is currently missing in Stardog. It needs to be adapted to identify repeated graph query patterns, so that they are only evaluated once and the results are reused.
Query execution
- Encapsulation of parallelism in the Volcano query processing system. All Stardog query execution is currently single-threaded. The paper describes specific operators that can adapt the pull-based execution model used in Stardog (so-called Volcano model) to support intra-query parallelism.
- Factorized (intermediate) query results (theory, see also a more practical application in graph databases). Compact representation of intermediate results helps, in certain scenarios, to avoid Cartesian explosions and dramatically improve performance.
To Learn More or to Apply
Send email to ai-internship@stardog.com and include the following:
- updated CV
- cover letter stating areas of interest (including new things we don’t mention above)
- time period of availability (starting & ending dates)
- your work location for the internship (remote preferred)