test: Use smaller files than Alice for tests. (#1474)

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Alice in wonderland expensive to process, plus Azure OpenAI flags it as
inappropriate content. Changed this to a smaller file about quantum
computers.

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Update vector search limit-none tests to use local quantum/NLP files
and add `test_data/Quantum_computers.txt`, adjusting queries
accordingly.
> 
> - **Tests**:
> - **Vector engine limit-none coverage**
(`test_vector_engine_search_none_limit`):
> - Replace `examples/data/alice_in_wonderland.txt` with
`tests/test_data/Quantum_computers.txt` and
`tests/test_data/Natural_language_processing.txt` across
`cognee/tests/test_chromadb.py`, `cognee/tests/test_lancedb.py`, and
`cognee/tests/test_pgvector.py`.
> - Update query to "Tell me about Quantum computers" and keep assertion
verifying no implicit `limit` (ensure `len(result) > 15`).
>   - **Test data**:
>     - Add `cognee/tests/test_data/Quantum_computers.txt`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
af1603c8f9. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This commit is contained in:
Igor Ilic 2025-09-29 12:21:16 +02:00 committed by GitHub
commit 74f7a65110
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 49 additions and 25 deletions

View file

@ -85,7 +85,7 @@ Self-hosted package:
- Is highly customizable with custom tasks, pipelines, and a set of built-in search endpoints
Hosted platform:
- Includes a managed UI and a [hosted solution](www.cognee.ai)
- Includes a managed UI and a [hosted solution](https://www.cognee.ai)

View file

@ -68,21 +68,25 @@ async def test_getting_of_documents(dataset_name_1):
async def test_vector_engine_search_none_limit():
file_path = os.path.join(
pathlib.Path(__file__).resolve().parent.parent.parent,
"examples",
"data",
"alice_in_wonderland.txt",
file_path_quantum = os.path.join(
pathlib.Path(__file__).parent, "test_data/Quantum_computers.txt"
)
file_path_nlp = os.path.join(
pathlib.Path(__file__).parent,
"test_data/Natural_language_processing.txt",
)
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await cognee.add(file_path)
await cognee.add(file_path_quantum)
await cognee.add(file_path_nlp)
await cognee.cognify()
query_text = "List me all the important characters in Alice in Wonderland."
query_text = "Tell me about Quantum computers"
from cognee.infrastructure.databases.vector import get_vector_engine
@ -96,7 +100,8 @@ async def test_vector_engine_search_none_limit():
collection_name=collection_name, query_vector=query_vector, limit=None
)
# Check that we did not accidentally use any default value for limit in vector search along the way (like 5, 10, or 15)
# Check that we did not accidentally use any default value for limit
# in vector search along the way (like 5, 10, or 15)
assert len(result) > 15

View file

@ -0,0 +1,9 @@
A quantum computer is a computer that takes advantage of quantum mechanical phenomena.
At small scales, physical matter exhibits properties of both particles and waves, and quantum computing leverages this behavior, specifically quantum superposition and entanglement, using specialized hardware that supports the preparation and manipulation of quantum states.
Classical physics cannot explain the operation of these quantum devices, and a scalable quantum computer could perform some calculations exponentially faster (with respect to input size scaling) than any modern "classical" computer. In particular, a large-scale quantum computer could break widely used encryption schemes and aid physicists in performing physical simulations; however, the current state of the technology is largely experimental and impractical, with several obstacles to useful applications. Moreover, scalable quantum computers do not hold promise for many practical tasks, and for many important tasks quantum speedups are proven impossible.
The basic unit of information in quantum computing is the qubit, similar to the bit in traditional digital electronics. Unlike a classical bit, a qubit can exist in a superposition of its two "basis" states. When measuring a qubit, the result is a probabilistic output of a classical bit, therefore making quantum computers nondeterministic in general. If a quantum computer manipulates the qubit in a particular way, wave interference effects can amplify the desired measurement results. The design of quantum algorithms involves creating procedures that allow a quantum computer to perform calculations efficiently and quickly.
Physically engineering high-quality qubits has proven challenging. If a physical qubit is not sufficiently isolated from its environment, it suffers from quantum decoherence, introducing noise into calculations. Paradoxically, perfectly isolating qubits is also undesirable because quantum computations typically need to initialize qubits, perform controlled qubit interactions, and measure the resulting quantum states. Each of those operations introduces errors and suffers from noise, and such inaccuracies accumulate.
In principle, a non-quantum (classical) computer can solve the same computational problems as a quantum computer, given enough time. Quantum advantage comes in the form of time complexity rather than computability, and quantum complexity theory shows that some quantum algorithms for carefully selected tasks require exponentially fewer computational steps than the best known non-quantum algorithms. Such tasks can in theory be solved on a large-scale quantum computer whereas classical computers would not finish computations in any reasonable amount of time. However, quantum speedup is not universal or even typical across computational tasks, since basic tasks such as sorting are proven to not allow any asymptotic quantum speedup. Claims of quantum supremacy have drawn significant attention to the discipline, but are demonstrated on contrived tasks, while near-term practical use cases remain limited.
Emerging error-correcting codes aim to mitigate decoherence effects and are expected to pave the way for fault-tolerant quantum processors. Laboratories across the globe are investigating diverse qubit implementations, such as superconducting circuits, trapped ions, neutral atoms, and photonic systems. Significant government funding and private investment have created an ecosystem of startups and consortia focused on accelerating quantum hardware and software development. Universities are meanwhile launching interdisciplinary programs that teach physics, computer science, and engineering concepts necessary for tomorrow's quantum workforce. Establishing reliable benchmarking standards will be essential for objectively comparing devices and charting realistic milestones toward practical quantum advantage.
Industry roadmaps anticipate that achieving error rates below the threshold for surface codes will require millions of physical qubits per logical qubit, highlighting daunting scale challenges. Researchers are therefore exploring hardwaresoftware co-design strategies, where algorithmic breakthroughs and device engineering progress hand in hand to minimize overhead. Hybrid quantum-classical workflows, exemplified by variational algorithms running on near-term devices, offer a pragmatic path to extracting value before full fault tolerance arrives. Meanwhile, cryptographers are advancing post-quantum encryption schemes to safeguard information in a future where Shors algorithm becomes practical. The interplay between theoretical advances, experimental ingenuity, and policy considerations will ultimately determine how transformative quantum computing becomes for science, industry, and society.
Collaborative open-source toolkits are lowering the barrier to entry for developers eager to prototype quantum algorithms and simulate small devices on classical hardware. As these software frameworks mature, they will foster standardization of gate libraries, circuit optimization passes, and error-mitigation techniques. At the same time, advances in cryogenic engineering, vacuum systems, and photonics are steadily improving the stability and manufacturability of next-generation qubit platforms. Policymakers are beginning to craft export controls and ethical guidelines aimed at preventing misuse while encouraging international collaboration in fundamental research. Ultimately, the success of quantum technology will hinge on integrating robust hardware, intelligent software, and a skilled workforce within an environment of responsible governance.

View file

@ -68,21 +68,25 @@ async def test_getting_of_documents(dataset_name_1):
async def test_vector_engine_search_none_limit():
file_path = os.path.join(
pathlib.Path(__file__).resolve().parent.parent.parent,
"examples",
"data",
"alice_in_wonderland.txt",
file_path_quantum = os.path.join(
pathlib.Path(__file__).parent, "test_data/Quantum_computers.txt"
)
file_path_nlp = os.path.join(
pathlib.Path(__file__).parent,
"test_data/Natural_language_processing.txt",
)
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await cognee.add(file_path)
await cognee.add(file_path_quantum)
await cognee.add(file_path_nlp)
await cognee.cognify()
query_text = "List me all the important characters in Alice in Wonderland."
query_text = "Tell me about Quantum computers"
from cognee.infrastructure.databases.vector import get_vector_engine
@ -96,7 +100,8 @@ async def test_vector_engine_search_none_limit():
collection_name=collection_name, query_vector=query_vector, limit=None
)
# Check that we did not accidentally use any default value for limit in vector search along the way (like 5, 10, or 15)
# Check that we did not accidentally use any default value for limit
# in vector search along the way (like 5, 10, or 15)
assert len(result) > 15

View file

@ -69,21 +69,25 @@ async def test_getting_of_documents(dataset_name_1):
async def test_vector_engine_search_none_limit():
file_path = os.path.join(
pathlib.Path(__file__).resolve().parent.parent.parent,
"examples",
"data",
"alice_in_wonderland.txt",
file_path_quantum = os.path.join(
pathlib.Path(__file__).parent, "test_data/Quantum_computers.txt"
)
file_path_nlp = os.path.join(
pathlib.Path(__file__).parent,
"test_data/Natural_language_processing.txt",
)
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await cognee.add(file_path)
await cognee.add(file_path_quantum)
await cognee.add(file_path_nlp)
await cognee.cognify()
query_text = "List me all the important characters in Alice in Wonderland."
query_text = "Tell me about Quantum computers"
from cognee.infrastructure.databases.vector import get_vector_engine
@ -97,7 +101,8 @@ async def test_vector_engine_search_none_limit():
collection_name=collection_name, query_vector=query_vector, limit=None
)
# Check that we did not accidentally use any default value for limit in vector search along the way (like 5, 10, or 15)
# Check that we did not accidentally use any default value for limit
# in vector search along the way (like 5, 10, or 15)
assert len(result) > 15